← Back to Architecture
depth and MLP width increase
ArchitectureUsed in
1 PRs
Best BPB
1.2026
Avg BPB
1.2026
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 426 | {"layers":10,"mlp_mult":3,"hidden_size":1536,"dim":512,"heads":8,"kv_heads":4} |