PR #200

open

Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012)

val_bpb

1.2012

Architecture

Transformer

Optimizer

NorMuon

Artifact Size

14,342,773 bytes

Training Techniques

Quantization

STE QAT

bits: 6

scope: all

Architecture

tied embeddings

Uses tied input/output embeddings.

parameters: null

Optimizer

Muon

weight_decay: null

momentum: 0.99

other_params: {"variant":"NorMuon"}

Compression

zstd

level: 22

Sequence Length

sequence_length

train_length: 1024

eval_length: null

LR Schedule

warmdown

parameters: {"warmdown_iters":3000,"warmup_start_momentum":0.92,"warmup_steps":1500}

Other

other

SP4096 SentencePiece BPE tokenizer with improved text compression over sp1024.

parameters: {"vocab_size":4096,"compression_improvement":"26%"}

other

Per-row int6 quantization with fp16 embedding passthrough and zstd-22 artifact compression.

parameters: {"range":"[-31,31]"}