PR #1839

open

Add artifact-aware LateQAT fixed-plan record

by cardonaView on GitHub
val_bpb
1.1920
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,415,044 bytes

Training Techniques

Quantization
late QAT
bits: null
scope: tensors in final fixed bit plan
GPTQ
bits: null
scope: artifact compiler refinement
Compression
lzma
level: null
Architecture
weight tying
tied 1024-token SentencePiece embeddings
parameters: null
BigramHash
BigramHash vocabulary component
parameters: {"vocab_size":10240,"dim":128}
SmearGate
SmearGate and scalar control tensors are used in the model
parameters: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"variant":"MuonW for matrix parameters","adamw_for":"embeddings/scalars"}
LR Schedule
warmdown
parameters: {"mode":"hybrid","iters":3700,"curve":"cosine","min_lr_scale":0.05}
Regularization
weight decay
parameters: null
Sequence Length
sequence_length
train_length: 524288
eval_length: null
Other
other
fixed per-tensor bit plan shared between late fake-quant training and artifact compilation
parameters: {"bit_plan":"budgeted_v2"}

Novel Contributions

  • Artifact-aware late fake-quant training aligned to the exact final per-tensor bit plan
  • Use of a fixed budgeted_v2 bit plan for both QAT and artifact compilation
  • Exact compiled mixed-precision artifact roundtrip evaluation
  • Single-run challenge-valid 10min_16mb submission with embedded runtime sources