← Back to Optimizer
NorMuon
OptimizerUsed in
13 PRs
Best BPB
1.0824
Avg BPB
1.2029
Submissions
PR #78by mtybadger
1.1858PR #89by vmfunc
1.1622PR #103by MatthewHRockwell
1.5000PR #116by abhishekgahlot2
1.1666PR #122by mtybadger
1.1603PR #137by abhishekgahlot2
1.1666PR #156by dexhunter
1.1602PR #173by tamoghnokandar
1.1532PR #204by Akasxh
1.2320PR #206by dexhunter
1.1507PR #418by yashverms
1.1715PR #438by stevenshinechen
1.3458PR #1520by taka6745
1.0824Hyperparameters Across PRs
| pr_number | weight_decay | momentum | other_params |
|---|---|---|---|
| 78 | — | — | — |
| 89 | — | 0.99 | {"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"muon_momentum_warmup_steps":1500,"muon_momentum_warmup_start":0.92} |
| 103 | — | 0.99 | {"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"warmup_start":0.92,"warmup_steps":1500} |
| 116 | 0.01 | 0.99 | {"matrix_lr":0.02,"grad_clip":0.3,"momentum_warmup_start":0.92,"momentum_warmup_steps":1500} |
| 122 | — | — | — |
| 137 | 0.01 | 0.99 | {"matrix_lr":0.02,"grad_clip":0.3,"momentum_warmup_start":0.92,"momentum_warmup_steps":1500} |
| 156 | — | 0.99 | {"beta2":0.95,"matrix_lr":0.02,"warmdown_iters":3000,"momentum_warmup_steps":1500,"momentum_warmup_start":0.92} |
| 173 | — | 0.99 | {"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"muon_momentum_warmup_start":0.92,"muon_momentum_warmup_steps":1500} |
| 204 | 0.02 | — | {"beta2":0.95} |
| 206 | 0.02 | 0.99 | {"beta2":0.95,"warmup_start":0.92,"matrix_lr":0.021,"scalar_lr":0.02,"tied_embed_lr":0.03} |
| 418 | 0.02 | 0.95 | {"beta2":0.95,"lr":0.04} |
| 438 | — | — | {"beta2":0.95,"second_momentum_buffer":true,"newton_schulz":true} |
| 1520 | — | — | {"post_ns_row_normalization":true} |