← Back to Architecture

QK depth ramp

Architecture
Used in
1 PRs
Best BPB
1.0809
Avg BPB
1.0809

Hyperparameters Across PRs

pr_numberparameters
1688{"qk_gain_init":5,"qk_gain_depth_ramp":0.5}