PR #1835

open

Record: SP8192 + PPM-D byte mixture — 1.00136 BPB (3-seed mean)

by anmarhindiView on GitHub
val_bpb
1.0014
Architecture
Transformer
Optimizer
Artifact Size
15,993,020 bytes

Training Techniques

Architecture
depth recurrence
Uses 3-layer recurrence / recurrent stack as part of the model setup.
parameters: {"layers":3}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
score-first TTT
parameters: null
Quantization
int6
bits: 6
scope: model weights
Compression
lzma
level: null
Other
other
Binary-λ-gated PPM-D byte-level mixture applied at eval time, mixing PPM probabilities with NN byte probabilities in probability space.
parameters: {"ppm_order":5,"ppm_lambda_hi":0.9,"ppm_lambda_lo":0.05,"ppm_conf_threshold":0.9,"ppm_subset_tokens":3000000}

Novel Contributions

  • Binary-λ-gated PPM-D byte-level mixture at evaluation time
  • Probability-space mixing of PPM and NN per-byte probabilities
  • Score-first online PPM updates using already-scored tokens only
  • Order-5 byte-level PPM with confidence-based gating
  • Wrapped launcher using lzma + base85 to fit submission size