PR #1877

open

Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)

by someone114514View on GitHub
val_bpb
0.9626
Architecture
Transformer
Optimizer
Artifact Size
15.997 MB

Training Techniques

Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: {"stride":64}
Other
other
Strict full-validation order-6 byte-level PPM-D mixture applied at evaluation time, combining neural-network byte probabilities with prefix-only PPM probabilities.
parameters: {"ppm_order":6,"ppm_lambda_hi":0.9,"ppm_lambda_lo":0.05,"ppm_conf_threshold":0.9,"ppm_log_cache_size":1048576,"skip_quantized_eval":1,"sliding_batch_seqs":32}

Novel Contributions

  • Replaced prior order-4 PPM with strict full-validation order-6 byte-level PPM mixture
  • Online PPM state built from already-scored byte prefix and updated only after each byte is scored
  • Binary prefix-only gating between NN and PPM probabilities based on PPM confidence threshold
  • Full-validation evaluation on the entire validation stream
  • Order-6 selected after full-val checks, outperforming slower order-7 and order-8 variants