val_bpb
1.2283
Architecture
Hybrid
Optimizer
—
Artifact Size
15,669,833 bytes
Training Techniques
Architecture
Hybrid
15-layer MLP-only ternary branch with targeted precision promotion on the tail MLP block.
parameters: {"layers":15,"mlp_only":true,"tail_block":"blocks.14.mlp"}
Quantization
QAT
bits: null
scope: MLP
Regularization
magnitude pruning
parameters: {"pct":0.032}
Sequence Length
sequence_length
train_length: 1536
eval_length: null
Novel Contributions
- Selective precision protection for the tail MLP block
- MLP-only ternary branch within a hybrid design
- Longer training sequence length of 1536 for improved transfer
- Non-record unlimited-compute submission under the 16MB cap