← Back to Architecture

SwiGLU

Architecture
Used in
16 PRs
Best BPB
1.1175
Avg BPB
1.2578

Hyperparameters Across PRs

pr_numberparameters
81{"mlp_hidden_dim":1024}
340
373
377{"experts":4}
430{"hidden":938}
509{"layers":11,"hidden":1792}
584
661{"hidden":1792}
799
1025{"hidden":341}
1090{"mlp_multiplier":3}
1235{"mlp_mult":4}
1391{"MLP_MULT":1}
1393{"mlp_mult":1}
1418
1428{"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":2}