PR #873
openE2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB)
by gowtham0992View on GitHub
val_bpb
1.0467
Architecture
Transformer
Optimizer
—
Artifact Size
13.12 MB
Training Techniques
Test-Time Training
score-first TTT
parameters: {"inner_learning_rate":0.001}
Other
other
MAML-style end-to-end test-time training with meta-learning; outer loop backpropagates through inner gradient steps using create_graph=True to optimize the initial weights for adaptation.
parameters: {"phased_training":true,"meta_learning_final_fraction":0.2}
Architecture
MLP
Inner-loop adaptation updates only the MLP weights of the last 3 blocks while freezing attention, embeddings, and norms.
parameters: {"blocks":3}
Evaluation
vectorized 7-gram backoff + kNN-LM
parameters: {"score_first":true}
Quantization
GPTQ
bits: null
scope: artifact
Novel Contributions
- First end-to-end test-time training submission in the competition
- MAML-style meta-learning that backpropagates through inner adaptation steps
- Phased training with a final meta-learning fine-tuning stage
- Score-first TTT combined with vectorized 7-gram backoff and kNN-LM evaluation
- GPTQ quantization to fit the artifact size limit