PR #1433

open

[Non-record] Codebooks! - val_bpb 1.2067 (3-seed mean)

by mtybadgerView on GitHub
val_bpb
1.2067
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Quantization
codebook quant
bits: null
scope: MLP/attn weights
int8
bits: 8
scope: outlier tensors
QAT
bits: null
scope: codebook weights
GPTQ
bits: null
scope: codebook assignment/scales
Other
other
Hadamard transform with random sign-flip and rotation to make blocked weights more isotropic before codebook assignment
parameters: null
other
Hessian-aware codebook index and scale assignment using GPTQ machinery and collected Hessians
parameters: null
other
Approximate codebook quantization run every 16 steps during training to provide quantization-aware feedback
parameters: {"interval_steps":16}
Regularization
L2 loss
parameters: {"applied_to":"approximate codebook quantization","schedule":"turned on at end of training"}

Novel Contributions

  • Fixed EP8 lattice codebook for weight compression
  • Hadamard transform preprocessing for blocked weights
  • Hessian-aware codebook assignment and scale selection
  • Lightweight quantization-aware proxy training with periodic approximate quantization and auxiliary L2 loss
  • Outlier-path fallback to int8 for hardest tensors