← Back to Optimizer
Muon + Adam
OptimizerUsed in
7 PRs
Best BPB
1.1387
Avg BPB
1.1914
Submissions
Hyperparameters Across PRs
| pr_number | weight_decay | momentum | other_params |
|---|---|---|---|
| 148 | — | — | {"matrix_lr":0.012,"scalar_lr":0.012,"tied_embed_lr":0.015,"grad_clip_norm":0.3} |
| 319 | — | — | {"split_optimizer":true} |
| 537 | — | — | {"Muon":"used for hidden and attention parameters","Adam":"used for embeddings and scalar parameters"} |
| 550 | — | — | {"Muon_scope":"matrices","Adam_scope":"scalars"} |
| 575 | 0.04 | — | {"MATRIX_LR":0.02,"SCALAR_LR":0.02,"TIED_EMBED_LR":0.05} |
| 637 | 0.04 | — | {"embed_lr":0.03,"matrix_lr":0.02,"scalar_lr":0.02} |
| 835 | — | — | {"matrix_lr":0.012,"scalar_lr":0.012,"tied_embed_lr":0.015,"grad_clip_norm":0.3} |