PR #1664
openWIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base
by zoharb157View on GitHub
val_bpb
1.0980
Architecture
Transformer
Optimizer
—
Artifact Size
16MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
mixed int6
bits: 6
scope: all
Other
other
Sequential cross-layer GPTQ propagation: quantize layers one at a time, inject quantized weights back into the model, and collect Hessians for later layers using quantized activations.
parameters: {"enabled":true}
other
Groupwise int6 scales with group size 128, using per-group fp16 scales instead of per-row scales.
parameters: {"group_size":128}
other
Hessian-weighted scale selection that minimizes weighted reconstruction error using Hessian diagonal terms.
parameters: null
Novel Contributions
- Sequential cross-layer GPTQ propagation
- Groupwise int6 scales with group_size=128
- Hessian-weighted scale selection
- Post-training quantization improvements with zero training-time cost