PR #501

open

Non-record: 1xH100 warmdown100 30m scaling run

val_bpb

1.2771

Architecture

Transformer

Optimizer

—

Artifact Size

15.8MB

Training Techniques

Quantization

int8

bits: 8

scope: null

LR Schedule

warmdown

parameters: {"warmdown_iters":100}

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Compression

zlib

level: null

Budget-efficient 30-minute 1xH100 run using baseline script with WARMDOWN_ITERS=100 scheduler tweak
Same-session controlled comparison against a 10-minute baseline to measure quality improvement
Demonstrated improvement in val_bpb by extending wallclock from 10m to 30m on the same 1x recipe
Use of int8 quantization with zlib compression to fit under 16MB artifact size cap