← Back to Architecture

attention modifications

Architecture
Used in
4 PRs
Best BPB
1.3288
Avg BPB
1.4080

Hyperparameters Across PRs

pr_numberparameters
981
1074{"hyperbolic_qk_mix":0.02,"hyperbolic_radius_init":0.1}
1168
1558{"qk_gain":4}