PR #1257

open

Add: 11L Complement Training + TTT + No-JEPA submission (val_bpb 1.0855)

val_bpb
1.0855
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15.99 MB

Training Techniques

Regularization
LeakyReLU
parameters: {"slope":0.5}
Other
other
Complement training that down-weights loss on tokens correctly predicted by a bigram predictor
parameters: {"alpha":0.5}
other
Disable JEPA auxiliary module
parameters: null
Test-Time Training
full TTT
parameters: {"learning_rate":0.0005,"epochs":3}
LR Schedule
cosine decay
parameters: null

Novel Contributions

  • Complement training with bigram-based loss reweighting
  • Test-time training on validation tokens
  • Disabling JEPA auxiliary module improves validation score
  • Best compliant submission achieves 1.0876 bpb; best overall achieves 1.0855 bpb