PR #1885
openRecord: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)
by leon2k2k2kView on GitHub
val_bpb
0.9944
Architecture
Transformer
Optimizer
—
Artifact Size
15.92 MB
Training Techniques
Evaluation
stride-based eval
parameters: {"stride":2048}
Other
other
PPM byte-mixture scoring with an anti-hijack gate that suppresses the high-lambda branch when the neural network is already confident on the actual byte.
parameters: {"nn_skip_thr_nats":0.277,"nn_skip_thr_bits":0.4,"ppm_conf_threshold":0.76,"ppm_lambda_hi":0.9,"ppm_lambda_lo":0.05,"ppm_order":4}
other
Full validation scoring over all 47,851,520 tokens with gathered PPM scoring across all 8 ranks.
parameters: {"full_val_tokens":47851520,"ranks":8}
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Anti-hijack gate in score_byte to prevent the PPM mixture from compounding when the NN already predicts the byte confidently.
- Lowered PPM confidence threshold to widen the high-lambda region while guarding against hijacking.
- Full-val evaluation over all 47.85M tokens with 8-rank gathered PPM scoring.
- Stackable local patch compatible with PR #1881 and PR #1877.