← Back to LR Schedule

cosine decay with linear warmup

LR Schedule
Used in
3 PRs
Best BPB
1.0574
Avg BPB
1.1451

Hyperparameters Across PRs

pr_numberparameters
131{"warmup_steps":200,"min_lr_ratio":0.05}
612{"warmup_steps":20,"warmdown_start_step":7000,"total_steps":12000}
684{"start_lr":0.0005,"end_lr":0.00002,"warmup":"1 epoch"}