← Back to LR Schedule

learning rate scaling

LR Schedule
Used in
1 PRs
Best BPB
1.1428
Avg BPB
1.1428

Hyperparameters Across PRs

pr_numberparameters
648{"scale":"1/sqrt(num_loops)"}