PR #1327

open

Record: BESE Tokenizer 287 vocab — 1.1276 BPB

val_bpb

1.1276

Architecture

Transformer

Optimizer

Muon

Artifact Size

15.3 MB

Training Techniques

Architecture

XSA

XSA applied across all 11 layers from the referenced architecture.

parameters: {"layers":11}

BigramHash

BigramHash component used in the model architecture.

parameters: {"dimensions":"3072x112"}

Quantization

GPTQ

bits: 6

scope: all

Optimizer

Muon

weight_decay: 0.04

momentum: null

other_params: null

Weight Averaging

EMA

parameters: {"decay":0.997}

SWA

parameters: {"every_steps":50}

Compression

lzma

level: 9

Evaluation

sliding window eval

parameters: {"stride":64}

Sequence Length

sequence_length

train_length: null

eval_length: null

Novel BESE tokenizer replacing SentencePiece-1024
287-token vocabulary built from a 38-token structured base alphabet plus 249 BPE merges
Tokenizer design inspired by T9 phone keyboards, Huffman coding, Bionic Reading, and hierarchical encoding
Custom data preparation pipeline that decodes SentencePiece shards, trains BPE, and re-encodes shards
Tokenizer-agnostic BPB verification for custom-tokenizer evaluation