PR #1905
openReport: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain
by leon2k2k2kView on GitHub
val_bpb
1.0324
Architecture
—
Optimizer
—
Artifact Size
—
Training Techniques
Novel Contributions
- Argues that the uniform-spread byte-level scoring used in recent PPM-D mixture submissions is not a valid probability distribution
- Shows that the apparent gain comes from the scoring construction rather than from PPM itself
- Compares uniform-spread versus conditional byte distributions and demonstrates that the sign of the PPM gain can flip
- Provides worked byte-level and token-level examples illustrating how loss is redistributed across bytes