← Back to Optimizer

Muon (matrix), Adam (scalar/embed)

Optimizer
Used in
1 PRs
Best BPB
1.1828
Avg BPB
1.1828

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
599{"matrix_lr":0.02,"scalar_lr":0.02}