← Back to Test-Time Training

two-phase TTT

Test-Time Training
Used in
3 PRs
Best BPB
1.1216
Avg BPB
1.1220

Hyperparameters Across PRs

pr_numberparameters
410{"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"LayerNorm weights, scales, final_norm"},"phase_2":{"method":"selective-freeze block adaptation","epochs":15,"optimizer":"SGD","learning_rate":0.003,"trainable_params":"last 2 transformer blocks, norms, scales, lm_head"}}
415{"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"unfrozen_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":25,"optimizer":"SGD","learning_rate":0.005,"unfrozen_params":"~7.6M"}}
417{"phase_1":{"method":"norm-only recalibration","epochs":50,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":10,"optimizer":"SGD","learning_rate":0.005,"trainable_params":"~7.6M"}}