val_bpb
1.7232
Architecture
baseline architecture
Optimizer
—
Artifact Size
8,397,395 bytes
Training Techniques
LR Schedule
warmdown
parameters: {"warmdown_iters":3600,"matrix_lr":0.06}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Schedule tuning only
- WARMDOWN_ITERS=3600
- MATRIX_LR=0.06
- No architecture changes
- No tokenizer or dataset changes
- Improved over local MLX baseline under the 16MB constraint