val_bpb
1.8658
Architecture
Transformer
Optimizer
—
Artifact Size
15,980,840 bytes
Training Techniques
Architecture
JEPA-style regression transformer
Causal transformer trained to predict next-token embeddings with an MSE objective instead of direct token classification.
parameters: null
MLP
Small auxiliary rescuer decoder that maps raw regression latents v_void to corrected latents v_rescued before final decoding.
parameters: {"parameters":524288}
Regularization
weight decay
parameters: null
Compression
zlib
level: null
Novel Contributions
- JEPA-style regression language model for the Parameter Golf challenge
- Auxiliary rescuer decoder that corrects regression latents before token decoding
- Regression-only training with MSE against target token embeddings
- Three-seed stability evidence under the 10-minute 8xH100 budget
- Comparison against standalone regression-only baselines