← Back to Architecture
11L Transformer
ArchitectureUsed in
1 PRs
Best BPB
0.9674
Avg BPB
0.9674
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 727 | {"layers":11,"d_model":512,"gqa_heads":8,"kv_heads":4,"mlp_multiplier":3} |