PR #1862

open

Add SkipQuant Adapter TTT causal PPM-D byte mixture

by Hetul803View on GitHub
val_bpb
1.1877
Architecture
Transformer
Optimizer
Artifact Size
<16MB

Training Techniques

Quantization
mixed int8/int6
bits: null
scope: model artifact
Test-Time Training
score-first TTT
parameters: {"epochs":2,"chunk_tokens":8192}
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Causal byte-level PPM-D mixture with confidence-gated convex interpolation between neural and PPM probabilities
parameters: {"order":5,"threshold":0.78,"lambda_hi":0.9,"lambda_lo":0.05}
other
Score-before-update causal evaluation for both TTT and PPM in a single left-to-right pass
parameters: null
Architecture
weight tying
SkipQuant Adapter Transformer stack with tied embeddings implied by SP8192/adapter setup not explicitly stated
parameters: null

Novel Contributions

  • Strict causal score-before-update evaluation for both TTT and byte-level PPM-D
  • Confidence-gated convex mixture of neural and PPM probabilities
  • Byte-level PPM-D mixture on top of a SkipQuant Adapter TTT stack
  • UTF-8 byte probability distribution for BPB accounting
  • Fast compliant low-epoch TTT evaluation with 8192-token chunks