PR #1908

open

Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean)

by romeerpView on GitHub

val_bpb

1.0608

Architecture

Transformer

Optimizer

—

Artifact Size

~15.99 MB

Training Techniques

Quantization

GPTQ

bits: 8

scope: selected 64-column group / mixed precision base

mixed int8 GPTQ

bits: 8

scope: one salient 64-column group

Test-Time Training

full TTT

parameters: null

Other

other

Activation-aware GPTQ calibration using per-input-channel activation RMS and AWQ-style saliency scoring to choose a salient column group for higher-precision quantization.

parameters: {"group_size":64,"top_k":1}

other

Step-matched evaluation with FORCE_STOP_STEP to compare against PR #1855 at identical seed/step counts.

parameters: {"seeds":[42,0,1234]}

Novel Contributions

Activation-aware mixed-precision GPTQ path
AWQ-style saliency scoring using activation RMS and weight magnitude
Quantizing one salient 64-column group at int8 inside the GPTQ solve
Keeping stock PR #1855 LQER on top of the AWQ-aware GPTQ base
Step-matched 3-seed comparison against PR #1855 using identical stop steps