← Back to Test-Time Training

TTT

Test-Time Training
Used in
35 PRs
Best BPB
0.0214
Avg BPB
1.1111

Hyperparameters Across PRs

pr_numberparameters
196{"run_ttt_eval":1}
212{"max_steps":500,"freeze_blocks":1}
367{"learning_rate":0.002}
371{"epochs":3,"optimizer":"SGD"}
588
645
646
651
687{"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true}
818
901
962{"epochs":0,"freeze_blocks":2,"learning_rate":0.0001}
1026
1058{"enabled":false}
1184{"enabled":false}
1243{"enabled":0}
1250
1251
1307{"enabled":false}
1398
1414{"variant":"Discriminative TTT","per_block_adaptive_lr":true,"pre_quantization":true}
1569{"mode":"off"}
1578
1814{"enabled":false}
1832{"enabled":true,"learning_rate":0.005,"epochs":3}
1849
1867{"enabled":false}
1907{"phased":true}
1954{"learning_rate":0.006,"epochs":3}
1970{"beta2":0.999}
1973{"epochs":1,"learning_rate":0.005,"momentum":0.9,"chunk_size":32000}
2009{"mode":"backward-only","adaptation_target":"layer norms"}
2013{"rank":8}
2028{"enabled":false}
2080{"enabled":false}