PR #1306
openRecord: Causal SLOT + Pre-quant TTT — val_bpb 1.0846 (3-seed mean)
by resouerView on GitHub
val_bpb
1.0846
Architecture
Transformer
Optimizer
AdamW
Artifact Size
~15.95 MB
Training Techniques
Evaluation
sliding window eval
parameters: null
Other
other
Causal SLOT: eval-time delta optimization restricted to context-only positions to preserve causal dependence
parameters: {"steps":8,"learning_rate":0.005}
Test-Time Training
full TTT
parameters: {"epochs":6,"learning_rate":0.0005,"freeze_first_blocks":2,"batch_size":32}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"ttt_learning_rate":0.0005,"ttt_epochs":6}
LR Schedule
cosine decay
parameters: null
Architecture
BigramHash
BigramHash 3072 used in the base merged SOTA configuration referenced by the submission
parameters: {"size":3072}
XSA
XSA-all used in the base merged SOTA configuration referenced by the submission
parameters: null
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: null
scope: all
Novel Contributions
- Causal SLOT with provably causal eval-time delta optimization using only context-scored positions
- Pre-quant AdamW test-time training before GPTQ quantization
- Coprime-stride multi-shard data loader
- Combined 3-seed mean val_bpb of 1.0846