← Back to Optimizer

Parameter Banking + Parallel Muon

Optimizer
Used in
1 PRs
Best BPB
1.1552
Avg BPB
1.1552

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
6530.040.99{"adam_wd":0.04,"muon_momentum_warmup_start":0.92,"muon_momentum_warmup_steps":1500,"matrix_lr":0.025,"scalar_lr":0.025,"tied_embed_lr":0.035}