← Back to Architecture

causal self-attention

Architecture
Used in
1 PRs
Best BPB
1.4574
Avg BPB
1.4574

Hyperparameters Across PRs

pr_numberparameters
476{"layers":2,"heads":8}