← Back to Architecture

MLP width

Architecture
Used in
3 PRs
Best BPB
1.1804
Avg BPB
1.2471

Hyperparameters Across PRs

pr_numberparameters
534{"hidden_size":1408}
1032{"model_dim":576,"layers":11}
1052{"from":3,"to":3.5}