← Back to Test-Time Training

TTT

Test-Time Training

Used in

38 PRs

Best BPB

0.0214

Avg BPB

1.1110

Submissions

PR #196by sicauzxl

PR #212by mrdavtan

PR #367by ksang123

PR #371by mrdavtan

PR #588by andyluo22

PR #645by FlynnCruse

PR #646by Upsalla

PR #651by phulin

PR #687by RoyiRa

PR #818by lucamignatti

PR #901by Hilo-Hilo

PR #962by AnirudhRahul

PR #1026by danielxmed

PR #1058by resouer

PR #1184by icryo

PR #1243by simon-marcus

PR #1250by ibarrajo

PR #1251by ibarrajo

PR #1307by amrayach

PR #1398by Mertyandimata

PR #1414by Abhishek8108

PR #1569by abbudjoe

PR #1578by mikeapedia

PR #1814by suryavanshi

PR #1832by sricursion

PR #1849by VedantKmr0

PR #1867by lijuncheng16

PR #1907by GodlyDonuts

PR #1954by Syed-M-Zeeshan

PR #1970by bsisduck

PR #1973by harborglowvintage-oss

PR #2009by SlavH

PR #2013by Wilbatronic

PR #2028by Arnie016

PR #2080by NewyorkDev

PR #2098by joshuaswanson

PR #2114by Sacmaj

PR #2125by tenet-diver

Hyperparameters Across PRs

pr_number	parameters
196	{"run_ttt_eval":1}
212	{"max_steps":500,"freeze_blocks":1}
367	{"learning_rate":0.002}
371	{"epochs":3,"optimizer":"SGD"}
588	—
645	—
646	—
651	—
687	{"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true}
818	—
901	—
962	{"epochs":0,"freeze_blocks":2,"learning_rate":0.0001}
1026	—
1058	{"enabled":false}
1184	{"enabled":false}
1243	{"enabled":0}
1250	—
1251	—
1307	{"enabled":false}
1398	—
1414	{"variant":"Discriminative TTT","per_block_adaptive_lr":true,"pre_quantization":true}
1569	{"mode":"off"}
1578	—
1814	{"enabled":false}
1832	{"enabled":true,"learning_rate":0.005,"epochs":3}
1849	—
1867	{"enabled":false}
1907	{"phased":true}
1954	{"learning_rate":0.006,"epochs":3}
1970	{"beta2":0.999}
1973	{"epochs":1,"learning_rate":0.005,"momentum":0.9,"chunk_size":32000}
2009	{"mode":"backward-only","adaptation_target":"layer norms"}
2013	{"rank":8}
2028	{"enabled":false}
2080	{"enabled":false}
2098	{"learning_rate":0.008,"epochs":4}
2114	{"rank":288}
2125	{"enabled":false}