← Back to Architecture

SwiGLU MLP

Architecture
Used in
5 PRs
Best BPB
1.1558
Avg BPB
1.2220

Hyperparameters Across PRs

pr_numberparameters
131{"hidden":1024}
163{"layers":7,"dim":576,"mlp_mult":2}
391{"hidden_size":1280}
395{"hidden_size":1280}
507{"expansion_factor":3}