PR #1163

open

[record] LR warmdown v1 (WARMDOWN_ITERS=900) with confirmed 10-min runs

by intelligentiaomniView on GitHub
val_bpb
1.5645
Architecture
Transformer
Optimizer
Artifact Size
9,589,715 bytes

Training Techniques

LR Schedule
warmdown
parameters: {"warmdown_iters":900}
linear warmup
parameters: {"warmup_steps":20}
Compression
zlib
level: null
Quantization
int8
bits: 8
scope: model weights

Novel Contributions

  • Schedule-only tuning of the official baseline using warmdown timing changes
  • Reduced WARMDOWN_ITERS to 900 while keeping baseline optimizer LR values fixed
  • Confirmed improvement with repeated 10-minute runs under the artifact limit
  • Used int8 quantization with zlib compression to fit within size constraints