PR #2117

open

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…

by JulianTang2027View on GitHub
val_bpb
1.0588
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,983,343 B

Training Techniques

Quantization
GPTQ
bits: null
scope: artifact/model
Architecture
weight tying
No explicit weight tying mentioned; included only if implied by base code is not supported.
parameters: null
SmearGate
Inherited from prior PR lineage as part of the reproduced stack.
parameters: null
Test-Time Training
LoRA TTT
parameters: {"rank":80,"alpha":144,"chunk_size":48,"local_lr_mult":0.75,"mask":"no_qv","phases":3,"prefix_docs":2500}
Regularization
label smoothing
parameters: {"label_smooth":0.05}
gradient centralization
parameters: null
Other
other
Independent 3-seed reproduction of PR #2101 and ablation study of unused knobs.
parameters: {"seeds":[42,0,1234]}

Novel Contributions

  • Independent 3-seed reproduction of PR #2101
  • Three ablations of unused author-introduced knobs
  • Negative-result evidence that LABEL_SMOOTH and GRAD_CENTRALIZE are null on this stack
  • Probe showing TTT_LOCAL_LR_MULT=0.85 is worse than the baseline 0.75 setting