← Back to LR Schedule
learning rate scaling
LR ScheduleUsed in
1 PRs
Best BPB
1.1428
Avg BPB
1.1428
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 648 | {"scale":"1/sqrt(num_loops)"} |
| pr_number | parameters |
|---|---|
| 648 | {"scale":"1/sqrt(num_loops)"} |