← Back to Test-Time Training
two-phase TTT
Test-Time TrainingUsed in
3 PRs
Best BPB
1.1216
Avg BPB
1.1220
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 410 | {"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"LayerNorm weights, scales, final_norm"},"phase_2":{"method":"selective-freeze block adaptation","epochs":15,"optimizer":"SGD","learning_rate":0.003,"trainable_params":"last 2 transformer blocks, norms, scales, lm_head"}} |
| 415 | {"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"unfrozen_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":25,"optimizer":"SGD","learning_rate":0.005,"unfrozen_params":"~7.6M"}} |
| 417 | {"phase_1":{"method":"norm-only recalibration","epochs":50,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":10,"optimizer":"SGD","learning_rate":0.005,"trainable_params":"~7.6M"}} |