val_bpb
1.2067
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Quantization
codebook quant
bits: null
scope: MLP/attn weights
int8
bits: 8
scope: outlier tensors
QAT
bits: null
scope: codebook weights
GPTQ
bits: null
scope: codebook assignment/scales
Other
other
Hadamard transform with random sign-flip and rotation to make blocked weights more isotropic before codebook assignment
parameters: null
other
Hessian-aware codebook index and scale assignment using GPTQ machinery and collected Hessians
parameters: null
other
Approximate codebook quantization run every 16 steps during training to provide quantization-aware feedback
parameters: {"interval_steps":16}
Regularization
L2 loss
parameters: {"applied_to":"approximate codebook quantization","schedule":"turned on at end of training"}
Novel Contributions
- Fixed EP8 lattice codebook for weight compression
- Hadamard transform preprocessing for blocked weights
- Hessian-aware codebook assignment and scale selection
- Lightweight quantization-aware proxy training with periodic approximate quantization and auxiliary L2 loss
- Outlier-path fallback to int8 for hardest tensors