PR #998

open

Add Conker-5 tandem residual exact experts non-record submission

by asuramayaView on GitHub
val_bpb
0.5755
Architecture
Hybrid
Optimizer
Artifact Size
3,811,521 bytes

Training Techniques

Quantization
int6
bits: 6
scope: artifact
Compression
zlib
level: null
Architecture
Hybrid
Tandem-trained Conker-3 base with sparse exact residual experts and gate-only learned selection.
parameters: null
Sequence Length
sequence_length
train_length: 256
eval_length: 256

Novel Contributions

  • Tandem-trained Conker-3 base model
  • Sparse exact residual experts for exact continuation reuse
  • Gate-only learned selection over residual experts
  • Packaged non-record submission under the 16MB artifact limit
  • Local MLX/Apple Silicon training and packaging workflow