val_bpb
1.3038
Architecture
Transformer
Optimizer
—
Artifact Size
15,037,372 bytes
Training Techniques
Architecture
weight tying
Pure ternary MLP-only model variant with no ternary attention; 15-layer configuration.
parameters: {"layers":15,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":3}
Quantization
QAT
bits: null
scope: MLP
Sequence Length
sequence_length
train_length: 1024
eval_length: null
LR Schedule
warmdown
parameters: null
Regularization
magnitude pruning
parameters: {"pct":0.032}
Other
other
Under-cap final export for the 16MB track.
parameters: null
Novel Contributions
- Non-record unlimited-compute submission for the 16MB track
- Pure ternary MLP-only quantization with no ternary attention
- 15-layer configuration with 20k training steps
- Strong roundtrip match between quant proxy and final export
- Artifact finished under the size cap