PR #1475

open

Non-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes

by JaksencView on GitHub
val_bpb
1.1307
Architecture
Transformer
Optimizer
Artifact Size
15,651,808 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
BigramHash
Bigram hash embedding used in the base stack
parameters: {"dimensions":"3072 x 112"}
XSA
XSA applied to all 11 layers in the base stack
parameters: {"layers":11}
Compression
lzma
level: 9
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: null
Regularization
pruning
parameters: null

Novel Contributions

  • Validated two-stage 8xH100 -> 1xH100 execution path
  • Stage 1 training and checkpoint export on 8xH100
  • Stage 2 GPTQ, artifact packing, and final evaluation on 1xH100
  • Saved result under the 16,000,000 byte cap
  • Demonstrated that GPTQ and final evaluation can be moved off the expensive 8xH100 box
  • Documented a reusable non-record baseline for future compliant reruns