PR #1775
openRecord: SP8192 + No Gates + Multi-Phase Global SGD TTT — val_bpb 1.07285 (3-seed mean)
by dentity007View on GitHub
val_bpb
1.0729
Architecture
Transformer
Optimizer
SGD
Artifact Size
~15.94 MB
Training Techniques
Architecture
SmearGate
Disabled SmearGate in the base architecture.
parameters: null
Gated Attention
Disabled AttnOutGate / attention output gating in the base architecture.
parameters: null
Test-Time Training
LoRA TTT
parameters: {"phased":true,"num_phases":3,"prefix_docs":2000,"learning_rate":0.001,"epochs":1}
Optimizer
SGD
weight_decay: null
momentum: null
other_params: {"global_ttt":true,"phases":3}
Other
other
Multi-phase global SGD test-time training on already-scored documents at phase boundaries.
parameters: {"phases":3,"prefix_docs":2000}
other
Track B compliant score-before-update and single left-to-right pass evaluation/training ordering.
parameters: null
Quantization
GPTQ
bits: 6
scope: embeddings/block weights
Novel Contributions
- 3-seed mean val_bpb 1.07285 on 8xH100 SXM
- Multi-Phase Global SGD TTT applied to the PR #1667 base with both gates disabled
- Vanilla SP8192 tokenizer with no Casefold, CaseOps, SLOT, or n-gram cache
- Track B compliant implementation with score-before-update and single-pass ordering
- Demonstrated improvement over single-phase score-first TTT on the same base