← Back to Architecture

larger MLP

Architecture
Used in
1 PRs
Best BPB
1.2236
Avg BPB
1.2236

Hyperparameters Across PRs

pr_numberparameters
812{"mlp_multiplier":2.65}