← Back to Architecture
GQA (Grouped-Query Attention)
ArchitectureUsed in
1 PRs
Best BPB
1.2364
Avg BPB
1.2364
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 600 | {"query_heads":8,"kv_heads":4} |