← Back to Architecture

MLP hidden size

Architecture
Used in
3 PRs
Best BPB
1.1804
Avg BPB
1.2427

Hyperparameters Across PRs

pr_numberparameters
42{"mlp_hidden":992}
73{"hidden_size":640}
543{"hidden_dim":1408}