PR #1873
openRecord: SP10240 Casefold + TTT + GPTQ + PPM-D — val_bpb 0.82005771 (3-seed mean)
by schattenjuwelView on GitHub
val_bpb
0.8201
Architecture
Transformer
Optimizer
SGD
Artifact Size
~15.99 MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: weights and embeddings
Test-Time Training
full TTT
parameters: {"learning_rate":0.008,"epochs":4}
Evaluation
sliding window eval
parameters: {"stride":64}
LR Schedule
cosine decay
parameters: null
Other
other
SP10240 casefold tokenizer with Unicode casefolding
parameters: {"vocab_size":10240}
other
Byte-level PPM-D order-5 causal mixture with confidence gating and token-level probability mixing
parameters: {"order":5,"lambda_high_confidence":0.05,"confidence_threshold":0.9}
Novel Contributions
- Byte-level PPM-D order-5 mixture added on top of TTT + GPTQ + SP10240 casefold stack
- Causal score-before-update PPM running on Rank 0 after distributed TTT scoring
- Token-level probability-space mixing of neural and PPM predictions
- Confidence-gated mixing based on PPM confidence
- 3-seed validation with reported mean BPB