← Back to Architecture
QK depth ramp
ArchitectureUsed in
1 PRs
Best BPB
1.0809
Avg BPB
1.0809
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1688 | {"qk_gain_init":5,"qk_gain_depth_ramp":0.5} |