PR #998
openAdd Conker-5 tandem residual exact experts non-record submission
by asuramayaView on GitHub
val_bpb
0.5755
Architecture
Hybrid
Optimizer
—
Artifact Size
3,811,521 bytes
Training Techniques
Quantization
int6
bits: 6
scope: artifact
Compression
zlib
level: null
Architecture
Hybrid
Tandem-trained Conker-3 base with sparse exact residual experts and gate-only learned selection.
parameters: null
Sequence Length
sequence_length
train_length: 256
eval_length: 256
Novel Contributions
- Tandem-trained Conker-3 base model
- Sparse exact residual experts for exact continuation reuse
- Gate-only learned selection over residual experts
- Packaged non-record submission under the 16MB artifact limit
- Local MLX/Apple Silicon training and packaging workflow