← Back to LR Schedule

warmdown + cosine decay

LR Schedule
Used in
2 PRs
Best BPB
0.7227
Avg BPB
0.9327

Hyperparameters Across PRs

pr_numberparameters
467{"warmdown_iters":3500,"ttt_cosine_epochs":50}
605{"warmdown_steps":6000,"per_step_cosine_decay":true}