PR #1346

open

Add non-record submission for 15L hybrid tail seq1536 20k

by shasank0001View on GitHub
val_bpb
1.2283
Architecture
Hybrid
Optimizer
Artifact Size
15,669,833 bytes

Training Techniques

Architecture
Hybrid
15-layer MLP-only ternary branch with targeted precision promotion on the tail MLP block.
parameters: {"layers":15,"mlp_only":true,"tail_block":"blocks.14.mlp"}
Quantization
QAT
bits: null
scope: MLP
Regularization
magnitude pruning
parameters: {"pct":0.032}
Sequence Length
sequence_length
train_length: 1536
eval_length: null

Novel Contributions

  • Selective precision protection for the tail MLP block
  • MLP-only ternary branch within a hybrid design
  • Longer training sequence length of 1536 for improved transfer
  • Non-record unlimited-compute submission under the 16MB cap