← Back to Architecture

MLP4x

Architecture
Used in
15 PRs
Best BPB
1.0764
Avg BPB
1.1166

Hyperparameters Across PRs

pr_numberparameters
498{"multiplier":4,"hidden_size":2560}
842{"layers":5,"model_dim":512,"mlp_mult":4,"hidden":2048,"num_heads":8,"num_kv_heads":4}
1052
1260{"multiplier":4}
1287{"multiplier":4}
1291{"multiplier":4}
1296
1326
1334{"vocab_size":4096}
1392{"multiplier":4}
1423{"multiplier":4}
1477
1555{"multiplier":4}
1658{"multiplier":4,"hidden_dim":2048}
1747{"layers":11,"intermediate_dim":2048}