← Back to Regularization

layerwise LN scaling

Regularization
Used in
1 PRs
Best BPB
0.8881
Avg BPB
0.8881

Hyperparameters Across PRs

pr_numberparameters
795