PR #2076

open

SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB

by teslaecoView on GitHub
val_bpb
0.9297
Architecture
Transformer
Optimizer
Artifact Size
15.92MB

Training Techniques

Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Train-only sparse micro-injection of V6 Privacy-Web-Filtering dataset into the training shard
parameters: {"injected_tokens":8192,"train_only":true,"validation_unchanged":true}
other
Byte-PPM model variant with O=5 and V6 micro modification
parameters: {"o":5}

Novel Contributions

  • Train-only sparse micro-injection of V6 Privacy-Web-Filtering data
  • Reproduction of SP8192 Byte-PPM O=5 stack with 3-seed evaluation
  • Official FineWeb validation preserved without leakage
  • Provided rebuild script, logs, and manifests for reproducibility