val_bpb
1.3515
Architecture
GPT
Optimizer
—
Artifact Size
12,622,882 bytes
Training Techniques
Quantization
int8
bits: 8
scope: model weights
Architecture
GQA attention
Uses grouped-query attention in the model stack.
parameters: null
Other
other
Late-window STE applied on CastedLinear during fake int8 quantization.
parameters: {"module":"CastedLinear","window":"late"}
other
torch.compile used with fullgraph disabled.
parameters: {"fullgraph":false}
Novel Contributions
- Non-record work-in-progress paired submission bundle
- FAKE_QUANT_INT8 with late-window STE on CastedLinear
- torch.compile with fullgraph=False
- GQA attention
- Int8 export clipped at the 99.995th percentile