PR #506

open

Non-record: 2026-03-22_SuperchunkBPE_SP1024

by eshansinghal14View on GitHub
val_bpb
1.2294
Architecture
Transformer
Optimizer
Artifact Size
15.1 MiB

Training Techniques

Other
other
Superchunk BPE tokenization: a two-phase BPE where first phase learns merges inside chunks, second phase learns cross-chunk merges interleaved by frequency into one merge table.

Novel Contributions

  • Introduction of Superchunk BPE tokenization combining phase-1 (within chunk) and phase-2 (cross-chunk) merges into a single merge table.
  • Use of Rust BPE tokenizer with superchunking for vocab size 1024.
  • Short training run on 8×H100 GPUs with 600s wall clock time targeting non-record 16MB track.