← Back to Architecture

MLP

Architecture
Used in
19 PRs
Best BPB
0.9650
Avg BPB
1.1830

Hyperparameters Across PRs

pr_numberparameters
301{"hidden_size":1472}
497{"MLP_HIDDEN":992}
602{"layers":10}
606{"scale":3.5,"activation":"relu²"}
628{"expansion_factor":3,"hidden_dim":1536,"activation":"ReLU²"}
636{"expansion":3}
641{"expansion_factor":4,"hidden_dim":3072,"activation":"relu²"}
653{"count":3}
873{"blocks":3}
939{"expansion":1.875}
1078{"rank":16}
1246{"expansion":4}
1354{"activation":"LeakyReLU","activation_power":2}
1513{"parameters":524288}
1538{"experts":4,"top_k":2}
1560
1625{"blocks":"L5-L10"}
1646{"multiplier":4.8}
1654{"layers":2}