← Back to Architecture

attention projections

Architecture
Used in
2 PRs
Best BPB
1.2207
Avg BPB
1.2216

Hyperparameters Across PRs

pr_numberparameters
1264{"q_levels":[[2,8],[4,16],[8,40]],"kv_levels":[[1,16],[2,16],[4,32]]}
1315