← Back to Regularization
weight decay and layerwise LN scale
RegularizationUsed in
1 PRs
Best BPB
1.0944
Avg BPB
1.0944
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 644 | {"weight_decay":0.04,"LN_scale":"1/sqrt(layer+1)"} |