PR #1741

open

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix

by amrayachView on GitHub

val_bpb

1.0722

Architecture

Transformer

Optimizer

—

Artifact Size

15,999,394 bytes

Training Techniques

Test-Time Training

LoRA TTT

parameters: {"rank":null,"mode":"phased","eval_only":true}

Other

other

Score-first n-gram posterior corrector layered on top of phased LoRA TTT; uses Laplace-smoothed unigram and n-gram counts to add logit bias during eval.

parameters: {"alpha":[0.1,0.3],"orders":[5,8,12]}

Quantization

int6

bits: 6

scope: model

Evaluation

quantized-eval-only

parameters: null

Sequence Length

sequence_length

train_length: 2048

eval_length: 2048

Novel Contributions

Independent reproduction of PR #1610 on separate infrastructure with a +1.913e-5 BPB delta from the published seed-0 result
Negative-result ablation showing a score-first n-gram posterior corrector degrades BPB across tested alpha/order settings
Bug fix for the quantized-eval-only path in train_gpt.py to prevent None-model dereference and unbound local cleanup errors