← Back to Regularization

LN scale depth damping

Regularization
Used in
1 PRs
Best BPB
1.1327
Avg BPB
1.1327

Hyperparameters Across PRs

pr_numberparameters
489{"init_scale_rule":"1/sqrt(layer_idx+1)"}