PR #2076

open

SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB

val_bpb

0.9297

Architecture

Transformer

Optimizer

—

Artifact Size

15.92MB

Training Techniques

Sequence Length

sequence_length

train_length: 8192

eval_length: null

Other

other

Train-only sparse micro-injection of V6 Privacy-Web-Filtering dataset into the training shard

parameters: {"injected_tokens":8192,"train_only":true,"validation_unchanged":true}

other

Byte-PPM model variant with O=5 and V6 micro modification

parameters: {"o":5}