← Back to Optimizer
MuonAdamW
OptimizerUsed in
2 PRs
Best BPB
1.5918
Avg BPB
1.7199
Hyperparameters Across PRs
| pr_number | weight_decay | momentum | other_params |
|---|---|---|---|
| 220 | — | — | {"param_groups":"SSM-aware parameter groups; Adam for A/B/C/D and Muon for projections"} |
| 1239 | 0.12 | 0.85 | {"lr":0.04} |