PR #873

open

E2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB)

by gowtham0992View on GitHub

val_bpb

1.0467

Architecture

Transformer

Optimizer

—

Artifact Size

13.12 MB

Training Techniques

Test-Time Training

score-first TTT

parameters: {"inner_learning_rate":0.001}

Other

other

MAML-style end-to-end test-time training with meta-learning; outer loop backpropagates through inner gradient steps using create_graph=True to optimize the initial weights for adaptation.

parameters: {"phased_training":true,"meta_learning_final_fraction":0.2}

Architecture

MLP

Inner-loop adaptation updates only the MLP weights of the last 3 blocks while freezing attention, embeddings, and norms.

parameters: {"blocks":3}

Evaluation

vectorized 7-gram backoff + kNN-LM

parameters: {"score_first":true}

Quantization

GPTQ

bits: null

scope: artifact

Novel Contributions

First end-to-end test-time training submission in the competition
MAML-style meta-learning that backpropagates through inner adaptation steps
Phased training with a final meta-learning fine-tuning stage
Score-first TTT combined with vectorized 7-gram backoff and kNN-LM evaluation
GPTQ quantization to fit the artifact size limit