PR #1874
openRecord: PR #1790 + Polar Express NS + MIN_LR + LQER Asym Rank-4 — val_bpb 1.06766 (3-seed mean)
by AjAnuboluView on GitHub
val_bpb
1.0677
Architecture
Transformer
Optimizer
Muon
Artifact Size
~15.95 MB
Training Techniques
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"backend_steps":5,"polar_express_ns":true}
Quantization
GPTQ
bits: 4
scope: top-3 layers residual correction
int4
bits: 4
scope: LQER asymmetric per-group correction
LR Schedule
warmdown
parameters: {"min_lr":0.1}
Test-Time Training
score-first TTT
parameters: {"phased":true,"rank":4,"top_k":3}
Architecture
SmearGate
Gate mechanism inherited from the PR #1790 base stack.
parameters: null
Gated Attention
Attention output gating inherited from the PR #1790 base stack.
parameters: {"width":24}
Novel Contributions
- Polar Express Newton-Schulz coefficients with per-iteration minimax-tuned tuples
- MIN_LR=0.10 warmdown floor to keep late training updates productive
- LQER asymmetric rank-4 quantization correction on top GPTQ residual layers
- Combined improvement over PR #1790 to 1.06766 val_bpb on a 3-seed mean