PR #1740

closed

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix

val_bpb

1.0722

Architecture

Transformer

Optimizer

—

Artifact Size

15,999,394 bytes

Training Techniques

Quantization

int6

bits: 6

scope: all

Test-Time Training

LoRA TTT

parameters: null

Other

other

Score-first n-gram posterior corrector blended onto eval logits; tested as an adaptive posterior correction path and found to degrade BPB.

parameters: {"alpha":[0.1,0.3],"orders":[[8],[5,8,12]]}

Evaluation

eval-only quantized path

parameters: null

Sequence Length

sequence_length

train_length: 2048

eval_length: 2048

Independent reproduction of PR #1610 seed-0 result with a +1.913e-5 BPB delta
Negative-result ablation showing an n-gram posterior corrector worsens BPB monotonically with alpha
Bug fix for the quantized-eval-only path in train_gpt.py to prevent None-model dereference crashes