← Back to LR Schedule

warmup and warmdown

LR Schedule
Used in
2 PRs
Best BPB
1.1472
Avg BPB
1.1474

Hyperparameters Across PRs

pr_numberparameters
179{"warmup_steps":1500,"warmdown_steps":3000}
592{"warmup_steps":1500,"late_QAT_start_scale":0.15}