← Back to LR Schedule

linear warmup + warmdown

LR Schedule
Used in
3 PRs
Best BPB
1.0321
Avg BPB
1.1041

Hyperparameters Across PRs

pr_numberparameters
65{"warmup_steps":20,"warmdown_iters":3000}
399{"muon_momentum_warmup_steps":1500,"warmdown_iters":3000}
755{"warmup_steps":50,"warmdown_iters":2500}