← Back to Architecture

MLP expansion

Architecture
Used in
5 PRs
Best BPB
1.1355
Avg BPB
1.1880

Hyperparameters Across PRs

pr_numberparameters
579{"hidden_dim":2560,"activation":"relu-squared"}
592{"expansion_factor":3,"hidden_dim":1536}
966{"baseline":"2.00x","short_conv":"1.99x","moc":"1.93x"}
1315{"scale_vs_baseline":2.65}
1551{"baseline_expand":"2x","temporary_expand":"4x","effective_training_width":"8x"}