PR #1908
openRecord: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean)
by romeerpView on GitHub
val_bpb
1.0608
Architecture
Transformer
Optimizer
—
Artifact Size
~15.99 MB
Training Techniques
Quantization
GPTQ
bits: 8
scope: selected 64-column group / mixed precision base
mixed int8 GPTQ
bits: 8
scope: one salient 64-column group
Test-Time Training
full TTT
parameters: null
Other
other
Activation-aware GPTQ calibration using per-input-channel activation RMS and AWQ-style saliency scoring to choose a salient column group for higher-precision quantization.
parameters: {"group_size":64,"top_k":1}
other
Step-matched evaluation with FORCE_STOP_STEP to compare against PR #1855 at identical seed/step counts.
parameters: {"seeds":[42,0,1234]}
Novel Contributions
- Activation-aware mixed-precision GPTQ path
- AWQ-style saliency scoring using activation RMS and weight magnitude
- Quantizing one salient 64-column group at int8 inside the GPTQ solve
- Keeping stock PR #1855 LQER on top of the AWQ-aware GPTQ base
- Step-matched 3-seed comparison against PR #1855 using identical stop steps