val_bpb
0.9297
Architecture
Transformer
Optimizer
—
Artifact Size
15.92MB
Training Techniques
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Train-only sparse micro-injection of V6 Privacy-Web-Filtering dataset into the training shard
parameters: {"injected_tokens":8192,"train_only":true,"validation_unchanged":true}
other
Byte-PPM model variant with O=5 and V6 micro modification
parameters: {"o":5}
Novel Contributions
- Train-only sparse micro-injection of V6 Privacy-Web-Filtering data
- Reproduction of SP8192 Byte-PPM O=5 stack with 3-seed evaluation
- Official FineWeb validation preserved without leakage
- Provided rebuild script, logs, and manifests for reproducibility