PR #313

open

non-record: LR warmdown on 1x A40 (1.723 bpb, 8.40MB)

by my-sonicaseView on GitHub
val_bpb
1.7232
Architecture
baseline architecture
Optimizer
Artifact Size
8,397,395 bytes

Training Techniques

LR Schedule
warmdown
parameters: {"warmdown_iters":3600,"matrix_lr":0.06}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null

Novel Contributions

  • Schedule tuning only
  • WARMDOWN_ITERS=3600
  • MATRIX_LR=0.06
  • No architecture changes
  • No tokenizer or dataset changes
  • Improved over local MLX baseline under the 16MB constraint