← Back to Test-Time Training
TTT
Test-Time TrainingUsed in
35 PRs
Best BPB
0.0214
Avg BPB
1.1111
Submissions
PR #196by sicauzxl
1.3825PR #212by mrdavtan
1.1329PR #367by ksang123
1.1770PR #371by mrdavtan
1.1401PR #588by andyluo22
1.4120PR #645by FlynnCruse
1.8990PR #646by Upsalla
1.1349PR #651by phulin
1.2093PR #687by RoyiRa
1.0745PR #818by lucamignatti
0.5527PR #901by Hilo-Hilo
1.1590PR #962by AnirudhRahul
0.0214PR #1026by danielxmed
1.0945PR #1058by resouer
1.1247PR #1184by icryo
0.9485PR #1243by simon-marcus
1.1230PR #1250by ibarrajo
1.2094PR #1251by ibarrajo
1.1349PR #1307by amrayach
1.1101PR #1398by Mertyandimata
1.1047PR #1414by Abhishek8108
0.7093PR #1569by abbudjoe
1.3576PR #1578by mikeapedia
1.0668PR #1814by suryavanshi
1.0742PR #1832by sricursion
1.0992PR #1849by VedantKmr0
1.0850PR #1867by lijuncheng16
1.7080PR #1907by GodlyDonuts
1.1071PR #1954by Syed-M-Zeeshan
1.0811PR #1970by bsisduck
1.0674PR #1973by harborglowvintage-oss
1.2193PR #2009by SlavH
1.0500PR #2013by Wilbatronic
1.0543PR #2028by Arnie016
1.0898PR #2080by NewyorkDev
0.9727Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 196 | {"run_ttt_eval":1} |
| 212 | {"max_steps":500,"freeze_blocks":1} |
| 367 | {"learning_rate":0.002} |
| 371 | {"epochs":3,"optimizer":"SGD"} |
| 588 | — |
| 645 | — |
| 646 | — |
| 651 | — |
| 687 | {"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true} |
| 818 | — |
| 901 | — |
| 962 | {"epochs":0,"freeze_blocks":2,"learning_rate":0.0001} |
| 1026 | — |
| 1058 | {"enabled":false} |
| 1184 | {"enabled":false} |
| 1243 | {"enabled":0} |
| 1250 | — |
| 1251 | — |
| 1307 | {"enabled":false} |
| 1398 | — |
| 1414 | {"variant":"Discriminative TTT","per_block_adaptive_lr":true,"pre_quantization":true} |
| 1569 | {"mode":"off"} |
| 1578 | — |
| 1814 | {"enabled":false} |
| 1832 | {"enabled":true,"learning_rate":0.005,"epochs":3} |
| 1849 | — |
| 1867 | {"enabled":false} |
| 1907 | {"phased":true} |
| 1954 | {"learning_rate":0.006,"epochs":3} |
| 1970 | {"beta2":0.999} |
| 1973 | {"epochs":1,"learning_rate":0.005,"momentum":0.9,"chunk_size":32000} |
| 2009 | {"mode":"backward-only","adaptation_target":"layer norms"} |
| 2013 | {"rank":8} |
| 2028 | {"enabled":false} |
| 2080 | {"enabled":false} |