← Back to Optimizer

Muon (matrices) and AdamW (embeddings and scalars)

Optimizer
Used in
1 PRs
Best BPB
1.1355
Avg BPB
1.1355

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
5790.99{"Muon_lr":0.025,"AdamW_embeddings_lr":0.035,"AdamW_scalars_lr":0.025,"gradient_clip":0.3}