PR #1905

open

Report: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain

by leon2k2k2kView on GitHub
val_bpb
1.0324
Architecture
Optimizer
Artifact Size

Training Techniques

Novel Contributions

  • Argues that the uniform-spread byte-level scoring used in recent PPM-D mixture submissions is not a valid probability distribution
  • Shows that the apparent gain comes from the scoring construction rather than from PPM itself
  • Compares uniform-spread versus conditional byte distributions and demonstrates that the sign of the PPM gain can flip
  • Provides worked byte-level and token-level examples illustrating how loss is redistributed across bytes