val_bpb
1.1920
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,415,044 bytes
Training Techniques
Quantization
late QAT
bits: null
scope: tensors in final fixed bit plan
GPTQ
bits: null
scope: artifact compiler refinement
Compression
lzma
level: null
Architecture
weight tying
tied 1024-token SentencePiece embeddings
parameters: null
BigramHash
BigramHash vocabulary component
parameters: {"vocab_size":10240,"dim":128}
SmearGate
SmearGate and scalar control tensors are used in the model
parameters: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"variant":"MuonW for matrix parameters","adamw_for":"embeddings/scalars"}
LR Schedule
warmdown
parameters: {"mode":"hybrid","iters":3700,"curve":"cosine","min_lr_scale":0.05}
Regularization
weight decay
parameters: null
Sequence Length
sequence_length
train_length: 524288
eval_length: null
Other
other
fixed per-tensor bit plan shared between late fake-quant training and artifact compilation
parameters: {"bit_plan":"budgeted_v2"}
Novel Contributions
- Artifact-aware late fake-quant training aligned to the exact final per-tensor bit plan
- Use of a fixed budgeted_v2 bit plan for both QAT and artifact compilation
- Exact compiled mixed-precision artifact roundtrip evaluation
- Single-run challenge-valid 10min_16mb submission with embedded runtime sources