PR #1741
openNon-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix
by amrayachView on GitHub
val_bpb
1.0722
Architecture
Transformer
Optimizer
—
Artifact Size
15,999,394 bytes
Training Techniques
Test-Time Training
LoRA TTT
parameters: {"rank":null,"mode":"phased","eval_only":true}
Other
other
Score-first n-gram posterior corrector layered on top of phased LoRA TTT; uses Laplace-smoothed unigram and n-gram counts to add logit bias during eval.
parameters: {"alpha":[0.1,0.3],"orders":[5,8,12]}
Quantization
int6
bits: 6
scope: model
Evaluation
quantized-eval-only
parameters: null
Sequence Length
sequence_length
train_length: 2048
eval_length: 2048
Novel Contributions
- Independent reproduction of PR #1610 on separate infrastructure with a +1.913e-5 BPB delta from the published seed-0 result
- Negative-result ablation showing a score-first n-gram posterior corrector degrades BPB across tested alpha/order settings
- Bug fix for the quantized-eval-only path in train_gpt.py to prevent None-model dereference and unbound local cleanup errors