PR #1347

open

Add non-record submission for 15L ternary MLP-only 20k

by shasank0001View on GitHub
val_bpb
1.3038
Architecture
Transformer
Optimizer
Artifact Size
15,037,372 bytes

Training Techniques

Architecture
weight tying
Pure ternary MLP-only model variant with no ternary attention; 15-layer configuration.
parameters: {"layers":15,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":3}
Quantization
QAT
bits: null
scope: MLP
Sequence Length
sequence_length
train_length: 1024
eval_length: null
LR Schedule
warmdown
parameters: null
Regularization
magnitude pruning
parameters: {"pct":0.032}
Other
other
Under-cap final export for the 16MB track.
parameters: null

Novel Contributions

  • Non-record unlimited-compute submission for the 16MB track
  • Pure ternary MLP-only quantization with no ternary attention
  • 15-layer configuration with 20k training steps
  • Strong roundtrip match between quant proxy and final export
  • Artifact finished under the size cap