PR #1839

open

Add artifact-aware LateQAT fixed-plan record

by cardonaView on GitHub

val_bpb

1.1920

Architecture

Transformer

Optimizer

Muon

Artifact Size

15,415,044 bytes

Training Techniques

Quantization

late QAT

bits: null

scope: tensors in final fixed bit plan

GPTQ

bits: null

scope: artifact compiler refinement

Compression

lzma

level: null

Architecture

weight tying

tied 1024-token SentencePiece embeddings

parameters: null

BigramHash

BigramHash vocabulary component

parameters: {"vocab_size":10240,"dim":128}

SmearGate

SmearGate and scalar control tensors are used in the model

parameters: null

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"variant":"MuonW for matrix parameters","adamw_for":"embeddings/scalars"}

LR Schedule

warmdown

parameters: {"mode":"hybrid","iters":3700,"curve":"cosine","min_lr_scale":0.05}

Regularization

weight decay

parameters: null

Sequence Length

sequence_length

train_length: 524288

eval_length: null

Other

other

fixed per-tensor bit plan shared between late fake-quant training and artifact compilation

parameters: {"bit_plan":"budgeted_v2"}

Novel Contributions

Artifact-aware late fake-quant training aligned to the exact final per-tensor bit plan
Use of a fixed budgeted_v2 bit plan for both QAT and artifact compilation
Exact compiled mixed-precision artifact roundtrip evaluation
Single-run challenge-valid 10min_16mb submission with embedded runtime sources