← Back to Architecture
attention projections
ArchitectureUsed in
2 PRs
Best BPB
1.2207
Avg BPB
1.2216
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1264 | {"q_levels":[[2,8],[4,16],[8,40]],"kv_levels":[[1,16],[2,16],[4,32]]} |
| 1315 | — |