PR #2092
openNon-record: GPTQ-lite Hessian Quantization + EMA — val_bpb 1.2142 (dim384, 11L, 15.5MB)
by dmptaView on GitHub
val_bpb
1.2142
Architecture
Transformer
Optimizer
—
Artifact Size
15.51 MB
Training Techniques
Quantization
GPTQ-lite
bits: 8
scope: all linear layers
QAT
bits: 8
scope: CastedLinear forward pass
Weight Averaging
EMA
parameters: {"decay":0.999,"start_step":100}
Architecture
KV head count
Standard attention with 8 heads and no GQA fallback fix applied in training script
parameters: {"heads":8,"layers":11,"dim":384}
Sequence Length
sequence_length
train_length: 2048
eval_length: null
Novel Contributions
- GPTQ-lite Hessian-diagonal per-row clip search for post-training INT8 quantization
- Near-lossless roundtrip quantization with a very small train/eval gap
- EMA decay and start-step tuning to avoid serialization collapse
- Forward-pass fake quantization during training with QAT