← Back to Architecture
depth/width tradeoff
ArchitectureUsed in
1 PRs
Best BPB
1.3693
Avg BPB
1.3693
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 93 | {"layers":12,"model_dim":384,"num_heads":6,"num_kv_heads":3,"mlp_mult":2} |