PR #1885

open

Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)

by leon2k2k2kView on GitHub
val_bpb
0.9944
Architecture
Transformer
Optimizer
Artifact Size
15.92 MB

Training Techniques

Evaluation
stride-based eval
parameters: {"stride":2048}
Other
other
PPM byte-mixture scoring with an anti-hijack gate that suppresses the high-lambda branch when the neural network is already confident on the actual byte.
parameters: {"nn_skip_thr_nats":0.277,"nn_skip_thr_bits":0.4,"ppm_conf_threshold":0.76,"ppm_lambda_hi":0.9,"ppm_lambda_lo":0.05,"ppm_order":4}
other
Full validation scoring over all 47,851,520 tokens with gathered PPM scoring across all 8 ranks.
parameters: {"full_val_tokens":47851520,"ranks":8}
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Anti-hijack gate in score_byte to prevent the PPM mixture from compounding when the NN already predicts the byte confidently.
  • Lowered PPM confidence threshold to widen the high-lambda region while guarding against hijacking.
  • Full-val evaluation over all 47.85M tokens with 8-rank gathered PPM scoring.
  • Stackable local patch compatible with PR #1881 and PR #1877.