← Back to Optimizer

NorMuon

Optimizer
Used in
13 PRs
Best BPB
1.0824
Avg BPB
1.2029

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
78
890.99{"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"muon_momentum_warmup_steps":1500,"muon_momentum_warmup_start":0.92}
1030.99{"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"warmup_start":0.92,"warmup_steps":1500}
1160.010.99{"matrix_lr":0.02,"grad_clip":0.3,"momentum_warmup_start":0.92,"momentum_warmup_steps":1500}
122
1370.010.99{"matrix_lr":0.02,"grad_clip":0.3,"momentum_warmup_start":0.92,"momentum_warmup_steps":1500}
1560.99{"beta2":0.95,"matrix_lr":0.02,"warmdown_iters":3000,"momentum_warmup_steps":1500,"momentum_warmup_start":0.92}
1730.99{"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"muon_momentum_warmup_start":0.92,"muon_momentum_warmup_steps":1500}
2040.02{"beta2":0.95}
2060.020.99{"beta2":0.95,"warmup_start":0.92,"matrix_lr":0.021,"scalar_lr":0.02,"tied_embed_lr":0.03}
4180.020.95{"beta2":0.95,"lr":0.04}
438{"beta2":0.95,"second_momentum_buffer":true,"newton_schulz":true}
1520{"post_ns_row_normalization":true}