PR #1868
openRecord: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)
by Christopher-Lee-McClendonView on GitHub
val_bpb
1.0615
Architecture
Transformer
Optimizer
Muon
Artifact Size
~15.95 MB
Training Techniques
Architecture
SmearGate
SmearGate attention with BOS boundary fix to prevent attention bleeding across document boundaries.
parameters: null
weight tying
Base architecture includes tied embeddings / weight tying if implied by the lineage is not explicit; not directly stated in this PR.
parameters: null
Quantization
GPTQ
bits: null
scope: model
QAT
bits: 7
scope: embeddings
Test-Time Training
score-first TTT
parameters: {"phases":3}
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192
Regularization
weight decay
parameters: null
Novel Contributions
- 3-seed reproduction of PR #1851
- Confirms SmearGate BOS fix result is robust and reproducible
- Reports mean and standard deviation across seeds 42, 314, and 1234
- Provides byte-identical reproduction of the original training script