← Back to Architecture

11L Transformer

Architecture
Used in
1 PRs
Best BPB
0.9674
Avg BPB
0.9674

Hyperparameters Across PRs

pr_numberparameters
727{"layers":11,"d_model":512,"gqa_heads":8,"kv_heads":4,"mlp_multiplier":3}