← Back to Architecture
Unified Attention
ArchitectureUsed in
2 PRs
Best BPB
1.1088
Avg BPB
1.1250
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1202 | {"layers":11,"dimension":528,"heads":4} |
| 1270 | — |