val_bpb
1.1349
Architecture
Transformer
Optimizer
—
Artifact Size
14.9 MB
Training Techniques
Quantization
GPTQ
bits: null
scope: all
Other
other
Online Hessian accumulation during training to eliminate post-training GPTQ calibration
parameters: null
Test-Time Training
TTT
parameters: null
Novel Contributions
- Online Hessian accumulation during training for GPTQ
- Eliminating separate post-training GPTQ calibration
- Demonstration that per-step overhead outweighed the saved calibration time