PR #2117
openRecord support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…
by JulianTang2027View on GitHub
val_bpb
1.0588
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,983,343 B
Training Techniques
Quantization
GPTQ
bits: null
scope: artifact/model
Architecture
weight tying
No explicit weight tying mentioned; included only if implied by base code is not supported.
parameters: null
SmearGate
Inherited from prior PR lineage as part of the reproduced stack.
parameters: null
Test-Time Training
LoRA TTT
parameters: {"rank":80,"alpha":144,"chunk_size":48,"local_lr_mult":0.75,"mask":"no_qv","phases":3,"prefix_docs":2500}
Regularization
label smoothing
parameters: {"label_smooth":0.05}
gradient centralization
parameters: null
Other
other
Independent 3-seed reproduction of PR #2101 and ablation study of unused knobs.
parameters: {"seeds":[42,0,1234]}
Novel Contributions
- Independent 3-seed reproduction of PR #2101
- Three ablations of unused author-introduced knobs
- Negative-result evidence that LABEL_SMOOTH and GRAD_CENTRALIZE are null on this stack
- Probe showing TTT_LOCAL_LR_MULT=0.85 is worse than the baseline 0.75 setting