← Back to LR Schedule

LR scheduling tuned for single-device run

LR Schedule
Used in
2 PRs
Best BPB
1.4078
Avg BPB
1.4078

Hyperparameters Across PRs

pr_numberparameters
707{"gradient_accum_tokens":131000,"iterations":2600}
712{"gradient_accum_tokens":131000,"iterations":2600}