val_bpb
0.9627
Architecture
Transformer
Optimizer
—
Artifact Size
1.13 MB
Training Techniques
Architecture
Transformer
Balanced Peak Architecture with 12 layers and model dimension 768; heads not specified.
parameters: {"layers":12,"model_dim":768}
Quantization
fully quantized
bits: null
scope: all
Novel Contributions
- Balanced Peak Architecture
- Balanced configuration selected after comparing deep stable, wide aggressive, and balanced peak variants
- Fully quantized and size-optimized model
- No test-time tricks; improvement attributed to training/model configuration