← Back to Optimizer
Muon (matrices) and AdamW (embeddings and scalars)
OptimizerUsed in
1 PRs
Best BPB
1.1355
Avg BPB
1.1355
Submissions
Hyperparameters Across PRs
| pr_number | weight_decay | momentum | other_params |
|---|---|---|---|
| 579 | — | 0.99 | {"Muon_lr":0.025,"AdamW_embeddings_lr":0.035,"AdamW_scalars_lr":0.025,"gradient_clip":0.3} |