PR #1605
closedScore-First TTT + Causal N-gram (order=82) — val_bpb 0.29882 (3-seed mean)
by renqianluoView on GitHub
val_bpb
0.2988
Architecture
Transformer
Optimizer
SGD
Artifact Size
≤16MB
Training Techniques
Test-Time Training
score-first TTT
parameters: {"epochs":1,"learning_rate":0.005,"optimizer":"SGD"}
Optimizer
SGD
weight_decay: null
momentum: null
other_params: {"learning_rate":0.005}
Architecture
BigramHash
Causal backoff n-gram mixer built during evaluation with high-order context memory and full_c_fix gating.
parameters: {"order":82,"buckets":4194304,"full_c_fix":1}
Evaluation
stride-based eval
parameters: {"stride":96}
Novel Contributions
- Score-first test-time training that scores each chunk before updating weights
- Causal backoff n-gram mixer with order 82
- Entropy-adaptive blending between neural and n-gram predictions
- full_c_fix to avoid predictions for unseen contexts
- Aggressive n-gram blending centered at entropy 1.0 bits