PR #2086

open

Non-record: Quinary quantization + SP16384 + per-group lrzip + TTT - bpb 1.1384

by deniskurlovView on GitHub

val_bpb

1.1384

Architecture

Hybrid

Optimizer

Muon

Artifact Size

15.72 MB

Training Techniques

Quantization

STE QAT

bits: 5

scope: quinary weights

QAT

bits: 8

scope: non-quantized linears

mixed int5/fp16

bits: 5

scope: scales and quinary weights

Architecture

weight tying

Tied embedding with a 380->576 bottleneck and shared input/output embeddings.

parameters: {"embed_bottleneck":"380->576","vocab_size":16384}

U-Net skip connections

Symmetric 10-layer U-Net with 5 encoder and 5 decoder layers.

parameters: {"layers":10,"encoder_layers":5,"decoder_layers":5,"model_dim":576}

GQA

Grouped-query attention with 6 query heads and 3 KV heads.

parameters: {"query_heads":6,"kv_heads":3,"head_dim":96}

ReLU²

MLP uses relu squared activation with 4x expansion.

parameters: {"mlp_mult":4,"hidden_dim":2304}

RoPE

YaRN rotary positional encoding with extended context.

parameters: {"base":5000,"max_len":2048}

Regularization

logit softcap

parameters: {"type":"poly5","cap":10}

Optimizer

Muon

weight_decay: 0

momentum: 0.95

other_params: {"backend_steps":3,"momentum_warmup_start":0.85,"momentum_warmup_steps":500}

Compression

custom

level: null

Evaluation

sliding window eval

parameters: {"stride":16,"train_seq_len":1024}

Test-Time Training

score-first TTT

parameters: {"epochs":3,"learning_rate":0.005,"tokens":32768}

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

LR Schedule

warmdown

parameters: {"warmdown_fraction":0.2,"warmup_steps":5}

Novel Contributions

Quinary {-2,-1,0,+1,+2} weight quantization with base-5 packing
Layout-aware per-stream archive with LZMA-screened layout selection and lrzip-zpaq fallback
SP16384 SentencePiece tokenizer trained from scratch
5-bit log-delta scale quantization
Score-first TTT on fp16 calibration parameters only
Quinary adaptation of the ternary U-Net record with reduced model dimension and adjusted GQA/embedding bottleneck