← Back to Test-Time Training

TTT-Linear

Test-Time Training
Used in
1 PRs
Best BPB
1.1347
Avg BPB
1.1347

Hyperparameters Across PRs

pr_numberparameters
1166{"heads":8,"mini_batch":16,"learning_rate":1}