PR #1257
openAdd: 11L Complement Training + TTT + No-JEPA submission (val_bpb 1.0855)
by BoxiYuView on GitHub
val_bpb
1.0855
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15.99 MB
Training Techniques
Regularization
LeakyReLU
parameters: {"slope":0.5}
Other
other
Complement training that down-weights loss on tokens correctly predicted by a bigram predictor
parameters: {"alpha":0.5}
other
Disable JEPA auxiliary module
parameters: null
Test-Time Training
full TTT
parameters: {"learning_rate":0.0005,"epochs":3}
LR Schedule
cosine decay
parameters: null
Novel Contributions
- Complement training with bigram-based loss reweighting
- Test-time training on validation tokens
- Disabling JEPA auxiliary module improves validation score
- Best compliant submission achieves 1.0876 bpb; best overall achieves 1.0855 bpb