← Back to Architecture

MLA

Architecture
Used in
3 PRs
Best BPB
1.2838
Avg BPB
1.3687

Hyperparameters Across PRs

pr_numberparameters
354{"kv_rank":128,"num_heads":8,"num_kv_heads":4}
739
1589{"latent_dim":128,"kv_compression_ratio":2}