PR #1353
openNon-record: 11L XSA-All + EMA + Legal GPTQ on 1xH100 PCIe (1.1546 bpb)
by Rtx09xView on GitHub
val_bpb
1.1547
Architecture
Transformer
Optimizer
—
Artifact Size
15,243,770 bytes
Training Techniques
Architecture
XSA
All-layer XSA architecture variant used in the submitted model.
parameters: {"layers":11,"scope":"all"}
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: 6
scope: all
Compression
lzma
level: null
Regularization
weight decay
parameters: {"higher_than_default":true}
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Novel Contributions
- 11-layer XSA-all model variant
- EMA during training
- Legal self-generated GPTQ export
- Hardened GPTQ export with Cholesky retry and damping for non-PD Hessians
- Fallback to percentile int6 quantization when Hessian factorization fails
- Explicit pre-quant checkpoint saved before export
- Non-record unlimited-compute submission under the 16MB cap