← Back to Test-Time Training

TTT

Test-Time Training
Used in
23 PRs
Best BPB
0.0214
Avg BPB
1.0991

Hyperparameters Across PRs

pr_numberparameters
196{"run_ttt_eval":1}
212{"max_steps":500,"freeze_blocks":1}
367{"learning_rate":0.002}
371{"epochs":3,"optimizer":"SGD"}
588
645
646
651
687{"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true}
818
901
962{"epochs":0,"freeze_blocks":2,"learning_rate":0.0001}
1026
1058{"enabled":false}
1184{"enabled":false}
1243{"enabled":0}
1250
1251
1307{"enabled":false}
1398
1414{"variant":"Discriminative TTT","per_block_adaptive_lr":true,"pre_quantization":true}
1569{"mode":"off"}
1578