PR #717

open

Grant nonrecord tied blocks

by JaksencView on GitHub
val_bpb
1.3515
Architecture
GPT
Optimizer
Artifact Size
12,622,882 bytes

Training Techniques

Quantization
int8
bits: 8
scope: model weights
Architecture
GQA attention
Uses grouped-query attention in the model stack.
parameters: null
Other
other
Late-window STE applied on CastedLinear during fake int8 quantization.
parameters: {"module":"CastedLinear","window":"late"}
other
torch.compile used with fullgraph disabled.
parameters: {"fullgraph":false}

Novel Contributions

  • Non-record work-in-progress paired submission bundle
  • FAKE_QUANT_INT8 with late-window STE on CastedLinear
  • torch.compile with fullgraph=False
  • GQA attention
  • Int8 export clipped at the 99.995th percentile