← Back to Optimizer

MuonAdamW

Optimizer
Used in
2 PRs
Best BPB
1.5918
Avg BPB
1.7199

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
220{"param_groups":"SSM-aware parameter groups; Adam for A/B/C/D and Muon for projections"}
12390.120.85{"lr":0.04}