← Back to Architecture

MLP3.5x

Architecture
Used in
7 PRs
Best BPB
0.1582
Avg BPB
0.9007

Hyperparameters Across PRs

pr_numberparameters
344{"hidden":1792}
544{"hidden_size":1792}
545{"hidden_size":512,"mlp_size":1792}
576{"hidden_dim":1792,"multiplier":3.5}
790{"multiplier":3.5,"hidden_size":1792}
825{"mlp_multiplier":3.5}
859{"multiplier":3.5}