← Back to Architecture
MLA
ArchitectureUsed in
3 PRs
Best BPB
1.2838
Avg BPB
1.3687
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 354 | {"kv_rank":128,"num_heads":8,"num_kv_heads":4} |
| 739 | — |
| 1589 | {"latent_dim":128,"kv_compression_ratio":2} |