PR #506

open

Non-record: 2026-03-22_SuperchunkBPE_SP1024

by eshansinghal14View on GitHub

val_bpb

1.2294

Architecture

Transformer

Optimizer

—

Artifact Size

15.1 MiB

Training Techniques

Other

other

Superchunk BPE tokenization: a two-phase BPE where first phase learns merges inside chunks, second phase learns cross-chunk merges interleaved by frequency into one merge table.

Novel Contributions

Introduction of Superchunk BPE tokenization combining phase-1 (within chunk) and phase-2 (cross-chunk) merges into a single merge table.
Use of Rust BPE tokenizer with superchunking for vocab size 1024.
Short training run on 8×H100 GPUs with 600s wall clock time targeting non-record 16MB track.