← Back to Architecture

depth and MLP width increase

Architecture
Used in
1 PRs
Best BPB
1.2026
Avg BPB
1.2026

Hyperparameters Across PRs

pr_numberparameters
426{"layers":10,"mlp_mult":3,"hidden_size":1536,"dim":512,"heads":8,"kv_heads":4}