PR #1516
openNon-Record: Polar Express Muon negative result (1.0805 BPB, +0.0004 vs standard NS5)
by dexhunterView on GitHub
val_bpb
1.0805
Architecture
Transformer
Optimizer
Muon
Artifact Size
16.00 MB
Training Techniques
Optimizer
Muon
weight_decay: null
momentum: 0.97
other_params: {"variant":"Polar Express","orthogonalization":"Newton-Schulz alternative polynomial coefficients"}
Test-Time Training
score-first TTT
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Novel Contributions
- Evaluates Polar Express Muon as an alternative Newton-Schulz orthogonalization variant
- Shows Polar Express Muon is slightly worse than standard Muon NS5 on this stack
- Documents a negative result to discourage further testing of this direction on the sp8192 stack
- Uses precomputed polynomial coefficients for the Polar Express variant