PR #1862

open

Add SkipQuant Adapter TTT causal PPM-D byte mixture

val_bpb

1.1877

Architecture

Transformer

Optimizer

—

Artifact Size

<16MB

Training Techniques

Quantization

mixed int8/int6

bits: null

scope: model artifact

Test-Time Training

score-first TTT

parameters: {"epochs":2,"chunk_tokens":8192}

Sequence Length

sequence_length

train_length: 8192

eval_length: null

Other

other

Causal byte-level PPM-D mixture with confidence-gated convex interpolation between neural and PPM probabilities

parameters: {"order":5,"threshold":0.78,"lambda_hi":0.9,"lambda_lo":0.05}

other

Score-before-update causal evaluation for both TTT and PPM in a single left-to-right pass

parameters: null

Architecture

weight tying

SkipQuant Adapter Transformer stack with tied embeddings implied by SP8192/adapter setup not explicitly stated

parameters: null