PR #2153

open

Exp/balanced 0.9627

by rixhavrajView on GitHub
val_bpb
0.9627
Architecture
Transformer
Optimizer
Artifact Size
1.13 MB

Training Techniques

Architecture
Transformer
Balanced Peak Architecture with 12 layers and model dimension 768; heads not specified.
parameters: {"layers":12,"model_dim":768}
Quantization
fully quantized
bits: null
scope: all

Novel Contributions

  • Balanced Peak Architecture
  • Balanced configuration selected after comparing deep stable, wide aggressive, and balanced peak variants
  • Fully quantized and size-optimized model
  • No test-time tricks; improvement attributed to training/model configuration