PR #1908

open

Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean)

by romeerpView on GitHub
val_bpb
1.0608
Architecture
Transformer
Optimizer
Artifact Size
~15.99 MB

Training Techniques

Quantization
GPTQ
bits: 8
scope: selected 64-column group / mixed precision base
mixed int8 GPTQ
bits: 8
scope: one salient 64-column group
Test-Time Training
full TTT
parameters: null
Other
other
Activation-aware GPTQ calibration using per-input-channel activation RMS and AWQ-style saliency scoring to choose a salient column group for higher-precision quantization.
parameters: {"group_size":64,"top_k":1}
other
Step-matched evaluation with FORCE_STOP_STEP to compare against PR #1855 at identical seed/step counts.
parameters: {"seeds":[42,0,1234]}

Novel Contributions

  • Activation-aware mixed-precision GPTQ path
  • AWQ-style saliency scoring using activation RMS and weight magnitude
  • Quantizing one salient 64-column group at int8 inside the GPTQ solve
  • Keeping stock PR #1855 LQER on top of the AWQ-aware GPTQ base
  • Step-matched 3-seed comparison against PR #1855 using identical stop steps