val_bpb
1.2600
Architecture
GPT-2
Optimizer
—
Artifact Size
15.86 MB
Training Techniques
Quantization
STE QAT
bits: 8
scope: model weights
Novel Contributions
- GPT baseline with int8 quantization-aware training
- Straight-through estimator used in CastedLinear.forward() for int8 simulation
- Training for robustness to post-training int8 quantization