val_bpb
1.4537
Architecture
Transformer
Optimizer
Muon
Artifact Size
11.38MB
Training Techniques
Quantization
int6
bits: 6
scope: all
Architecture
V22
Custom V22 architecture with efficient parameter usage
parameters: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"tuned":true}
Compression
zlib
level: 9
Novel Contributions
- INT6 quantization for aggressive compression
- V22 architecture with efficient parameter usage
- Fast convergence under strict compute and size constraints
- Use of zlib compression to fit the artifact within the size limit
- Single RTX 4090 training setup