PR #1251

open

Non-record: Online Hessian GPTQ (val_bpb=1.1349)

by ibarrajoView on GitHub
val_bpb
1.1349
Architecture
Transformer
Optimizer
Artifact Size
14.9 MB

Training Techniques

Quantization
GPTQ
bits: null
scope: all
Other
other
Online Hessian accumulation during training to eliminate post-training GPTQ calibration
parameters: null
Test-Time Training
TTT
parameters: null

Novel Contributions

  • Online Hessian accumulation during training for GPTQ
  • Eliminating separate post-training GPTQ calibration
  • Demonstration that per-step overhead outweighed the saved calibration time