← Back to Test-Time Training

SGD TTT

Test-Time Training
Used in
2 PRs
Best BPB
1.0857
Avg BPB
1.1032

Hyperparameters Across PRs

pr_numberparameters
533{"learning_rate":0.002,"epochs":3,"freeze_blocks":2,"max_train_chunks":50,"ema_decay":0}
1714{"epochs_per_chunk":3,"momentum":0.9,"score_before_update":true}