← Back to Architecture
MLP width
ArchitectureUsed in
3 PRs
Best BPB
1.1804
Avg BPB
1.2471
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 534 | {"hidden_size":1408} |
| 1032 | {"model_dim":576,"layers":11} |
| 1052 | {"from":3,"to":3.5} |