← Back to Architecture

MLP width reduction

Architecture
Used in
1 PRs
Best BPB
1.1929
Avg BPB
1.1929

Hyperparameters Across PRs

pr_numberparameters
355{"mlp_hidden":992}