PR #501

open

Non-record: 1xH100 warmdown100 30m scaling run

by aamodbhattView on GitHub
val_bpb
1.2771
Architecture
Transformer
Optimizer
Artifact Size
15.8MB

Training Techniques

Quantization
int8
bits: 8
scope: null
LR Schedule
warmdown
parameters: {"warmdown_iters":100}
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Compression
zlib
level: null

Novel Contributions

  • Budget-efficient 30-minute 1xH100 run using baseline script with WARMDOWN_ITERS=100 scheduler tweak
  • Same-session controlled comparison against a 10-minute baseline to measure quality improvement
  • Demonstrated improvement in val_bpb by extending wallclock from 10m to 30m on the same 1x recipe
  • Use of int8 quantization with zlib compression to fit under 16MB artifact size cap