PR #725
openSubmit 1x A100 QAT Fix - 1.5252 BPB (Non-Record) [v5]
by Shuvam-Banerji-SealView on GitHub
val_bpb
1.5252
Architecture
modded-nanogpt-derived Transformer
Optimizer
—
Artifact Size
15.77 MB
Training Techniques
Quantization
QAT
bits: 6
scope: all
Architecture
CastedLinear clip factor estimator
Replaces torch.quantile with w.abs().amax(dim=1).clamp_min for faster clip factor estimation and to avoid Triton compilation slowdown.
parameters: null
bigram embedding guard
Adds a guard for small-vocab edge cases in the bigram embedding path.
parameters: null
Other
other
Makes compressor-dependent labels and final-roundtrip labels explicit in training logs.
parameters: null
Sequence Length
sequence_length
train_length: 131000
eval_length: null
Weight Averaging
SWA
parameters: null
Evaluation
sliding window eval
parameters: null
Compression
zstd
level: null
Novel Contributions
- Single-device A100 tuning of QAT hyperparameters to fit within the wallclock cap
- Replaced torch.quantile with w.abs().amax(dim=1).clamp_min to avoid a large Triton compilation slowdown
- Added a guard for small-vocab bigram embedding edge cases
- Made compressor-dependent and final-roundtrip labels explicit in training logs
- Reported final submission metric from post-export sliding-window roundtrip evaluation