← Back to Regularization

LN scaling

Regularization
Used in
1 PRs
Best BPB
1.0465
Avg BPB
1.0465

Hyperparameters Across PRs

pr_numberparameters
758{"scale":"1/sqrt(layer+1)"}