PR #1868

open

Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)

by Christopher-Lee-McClendonView on GitHub
val_bpb
1.0615
Architecture
Transformer
Optimizer
Muon
Artifact Size
~15.95 MB

Training Techniques

Architecture
SmearGate
SmearGate attention with BOS boundary fix to prevent attention bleeding across document boundaries.
parameters: null
weight tying
Base architecture includes tied embeddings / weight tying if implied by the lineage is not explicit; not directly stated in this PR.
parameters: null
Quantization
GPTQ
bits: null
scope: model
QAT
bits: 7
scope: embeddings
Test-Time Training
score-first TTT
parameters: {"phases":3}
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192
Regularization
weight decay
parameters: null

Novel Contributions

  • 3-seed reproduction of PR #1851
  • Confirms SmearGate BOS fix result is robust and reproducible
  • Reports mean and standard deviation across seeds 42, 314, and 1234
  • Provides byte-identical reproduction of the original training script