← Back to Architecture
causal self-attention
ArchitectureUsed in
1 PRs
Best BPB
1.4574
Avg BPB
1.4574
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 476 | {"layers":2,"heads":8} |
| pr_number | parameters |
|---|---|
| 476 | {"layers":2,"heads":8} |