← Back to Optimizer
Muon + AdamW
OptimizerUsed in
5 PRs
Best BPB
1.1160
Avg BPB
1.2892
Submissions
Hyperparameters Across PRs
| pr_number | weight_decay | momentum | other_params |
|---|---|---|---|
| 525 | — | — | — |
| 531 | — | — | {"lr_matrix":0.02,"lr_embedding":0.03,"lr_scalar":0.02,"grad_accum_steps":8} |
| 536 | — | — | — |
| 559 | 0.04 | — | — |
| 583 | 0.045 | 0.99 | {"learning_rates":{"matrix":0.035,"tied_embed":0.045,"scalar":0.035},"momentum_warmup_start":0.92,"momentum_warmup_steps":1500,"grad_clip_norm":0.35,"warmdown_iters":2000,"warmup_steps":20,"batch_tokens":786432,"sequence_length":2048} |