← Back to Architecture
SwiGLU
ArchitectureUsed in
16 PRs
Best BPB
1.1175
Avg BPB
1.2578
Submissions
PR #81by polarizedfortnite-cpu
1.1670PR #340by starfly-web
1.2182PR #373by JoeProAI
1.1634PR #377by Complexity-ML
1.4072PR #430by sahiee-dev
1.1428PR #509by andrewbaggio1
1.1175PR #584by ssatia
1.1233PR #661by andrewbaggio1
1.1175PR #799by yuvraajbains
1.2005PR #1025by Zagot-byte
1.3579PR #1090by swapp1990
1.1573PR #1235by maksblu
1.3527PR #1391by Abhinav-Avasarala
1.4716PR #1393by Abhinav-Avasarala
1.4716PR #1418by Park-Tae-Hwan
1.4192PR #1428by ntwari-bruce
1.2370Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 81 | {"mlp_hidden_dim":1024} |
| 340 | — |
| 373 | — |
| 377 | {"experts":4} |
| 430 | {"hidden":938} |
| 509 | {"layers":11,"hidden":1792} |
| 584 | — |
| 661 | {"hidden":1792} |
| 799 | — |
| 1025 | {"hidden":341} |
| 1090 | {"mlp_multiplier":3} |
| 1235 | {"mlp_mult":4} |
| 1391 | {"MLP_MULT":1} |
| 1393 | {"mlp_mult":1} |
| 1418 | — |
| 1428 | {"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":2} |