PR #1946

open

Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)

by aquariouseworkmanView on GitHub
val_bpb
1.0609
Architecture
Transformer
Optimizer
Artifact Size
15,978,503 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: all weights with top-1 salient 64-column group at int8
mixed int6/int8
bits: null
scope: top-1 salient 64-column group per matrix

Novel Contributions

  • Activation-aware mixed-precision GPTQ (AWQ-lite) applied on top of the PR #1855 stack
  • Selects the most salient 64-column group per matrix using activation RMS and mean absolute weight magnitude
  • Quantizes the selected salient group to int8 while keeping the rest at int6
  • Reports a 3-seed mean val_bpb of 1.06086