← Back to Regularization

weight decay and layerwise LN scale

Regularization
Used in
1 PRs
Best BPB
1.0944
Avg BPB
1.0944

Hyperparameters Across PRs

pr_numberparameters
644{"weight_decay":0.04,"LN_scale":"1/sqrt(layer+1)"}