← Back to Architecture

GQA (Grouped-Query Attention)

Architecture
Used in
1 PRs
Best BPB
1.2364
Avg BPB
1.2364

Hyperparameters Across PRs

pr_numberparameters
600{"query_heads":8,"kv_heads":4}