PR #1905

open

Report: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain

val_bpb

1.0324

Architecture

—

Optimizer

—

Artifact Size

—

Training Techniques

Argues that the uniform-spread byte-level scoring used in recent PPM-D mixture submissions is not a valid probability distribution
Shows that the apparent gain comes from the scoring construction rather than from PPM itself
Compares uniform-spread versus conditional byte distributions and demonstrates that the sign of the PPM gain can flip
Provides worked byte-level and token-level examples illustrating how loss is redistributed across bytes