← Back to Optimizer

NeoMuon

Optimizer
Used in
2 PRs
Best BPB
1.1239
Avg BPB
1.1404

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
64000.95{"backend_steps":3,"momentum_warmup_start":0.85,"momentum_warmup_steps":500,"adam_lr":0.05,"adam_wd":0.05,"matrix_lr":0.04,"scalar_lr":0.02,"tied_embed_lr":0.02}
64100.95{"muon_backend_steps":3,"muon_momentum_warmup_start":0.85,"muon_momentum_warmup_steps":500}