PR #2032
closedRecord: SP8192 + Sliding-Window Eval + Conditional-PPM Byte Mixer - val_bpb 1.029282
by anmarhindiView on GitHub
val_bpb
1.0293
Architecture
Transformer
Optimizer
—
Artifact Size
15.59 MB
Training Techniques
Evaluation
sliding window eval
parameters: {"stride":64}
conditional-PPM byte mixer
parameters: {"alpha":15,"beta":0.8,"order":5}
Quantization
GPTQ
bits: 6
scope: all
Compression
lzma
level: null
Architecture
SmearGate
SmearGate BOS-mask fix and related base-model improvements from the lineage stack.
parameters: null
Regularization
dropout
parameters: {"stochastic_depth_max":0.02}
Other
other
Canonical first-byte marginalization over the SP8192 alphabet to derive a proper byte-level NN distribution before mixing with PPM.
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Conditional byte-level PPM mixer using a proper first-byte marginal from the SP8192 softmax
- Per-byte sigmoid gating between NN and PPM distributions
- Sliding-window stride-64 evaluation
- C2-clean normalization of byte_0 and remainder-byte distributions
- No new trainable parameters or additional artifact bytes