← Back to Architecture

MLP6

Architecture
Used in
2 PRs
Best BPB
1.1636
Avg BPB
1.1636

Hyperparameters Across PRs

pr_numberparameters
1897{"layers":5,"mlp_mult":6,"model_dim":512}
1917{"layers":5,"mlp_mult":6,"model_dim":512}