← Back to Test-Time Training

AdamW TTT

Test-Time Training
Used in
8 PRs
Best BPB
0.8265
Avg BPB
1.0478

Hyperparameters Across PRs

pr_numberparameters
430{"epochs":3,"learning_rate":0.001,"betas":[0.9,0.999],"frozen_layers":6}
462{"learning_rate":0.0005,"epochs":10,"weight_decay":0}
489{"learning_rate":0.0005,"weight_decay":0,"epochs":5}
532{"epochs":10,"learning_rate":0.001,"grad_clip":1,"all_params_unfrozen":true}
555{"epochs":10}
1350{"epochs":6,"freeze_first_blocks":2}
1485{"epochs":6,"learning_rate":0.0005,"freeze_blocks":2,"schedule":"cosine decay","pre_quant":true}
1488{"epochs":10,"learning_rate":0.00045,"freeze_blocks":1}