PR #1475
openNon-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes
by JaksencView on GitHub
val_bpb
1.1307
Architecture
Transformer
Optimizer
—
Artifact Size
15,651,808 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
Architecture
BigramHash
Bigram hash embedding used in the base stack
parameters: {"dimensions":"3072 x 112"}
XSA
XSA applied to all 11 layers in the base stack
parameters: {"layers":11}
Compression
lzma
level: 9
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: null
Regularization
pruning
parameters: null
Novel Contributions
- Validated two-stage 8xH100 -> 1xH100 execution path
- Stage 1 training and checkpoint export on 8xH100
- Stage 2 GPTQ, artifact packing, and final evaluation on 1xH100
- Saved result under the 16,000,000 byte cap
- Demonstrated that GPTQ and final evaluation can be moved off the expensive 8xH100 box
- Documented a reusable non-record baseline for future compliant reruns