← Back to LR Schedule

fixed learning rates

LR Schedule
Used in
1 PRs
Best BPB
1.1442
Avg BPB
1.1442

Hyperparameters Across PRs

pr_numberparameters
317{"matrix_lr":0.025,"scalar_lr":0.025,"tied_embed_lr":0.035}