val_bpb
1.1877
Architecture
Transformer
Optimizer
—
Artifact Size
<16MB
Training Techniques
Quantization
mixed int8/int6
bits: null
scope: model artifact
Test-Time Training
score-first TTT
parameters: {"epochs":2,"chunk_tokens":8192}
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Causal byte-level PPM-D mixture with confidence-gated convex interpolation between neural and PPM probabilities
parameters: {"order":5,"threshold":0.78,"lambda_hi":0.9,"lambda_lo":0.05}
other
Score-before-update causal evaluation for both TTT and PPM in a single left-to-right pass
parameters: null
Architecture
weight tying
SkipQuant Adapter Transformer stack with tied embeddings implied by SP8192/adapter setup not explicitly stated
parameters: null
Novel Contributions
- Strict causal score-before-update evaluation for both TTT and byte-level PPM-D
- Confidence-gated convex mixture of neural and PPM probabilities
- Byte-level PPM-D mixture on top of a SkipQuant Adapter TTT stack
- UTF-8 byte probability distribution for BPB accounting
- Fast compliant low-epoch TTT evaluation with 8192-token chunks