PR #1740
closedNon-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix
by amrayachView on GitHub
val_bpb
1.0722
Architecture
Transformer
Optimizer
—
Artifact Size
15,999,394 bytes
Training Techniques
Quantization
int6
bits: 6
scope: all
Test-Time Training
LoRA TTT
parameters: null
Other
other
Score-first n-gram posterior corrector blended onto eval logits; tested as an adaptive posterior correction path and found to degrade BPB.
parameters: {"alpha":[0.1,0.3],"orders":[[8],[5,8,12]]}
Evaluation
eval-only quantized path
parameters: null
Sequence Length
sequence_length
train_length: 2048
eval_length: 2048
Novel Contributions
- Independent reproduction of PR #1610 seed-0 result with a +1.913e-5 BPB delta
- Negative-result ablation showing an n-gram posterior corrector worsens BPB monotonically with alpha
- Bug fix for the quantized-eval-only path in train_gpt.py to prevent None-model dereference crashes