← Back to Initialization

depth-aware constant scale init

Initialization
Used in
1 PRs
Best BPB
1.1920
Avg BPB
1.1920

Hyperparameters Across PRs

pr_numberparameters
1496{"attn_scale":{"early":1,"mid":1.75,"late":2.5},"mlp_scale":{"early":1,"mid":1.15,"late":1.3}}