← Back to Test-Time Training

two-phase TTT

Test-Time Training

Used in

3 PRs

Best BPB

1.1216

Avg BPB

1.1220

Submissions

PR #410by EthanYangTW

PR #415by EthanYangTW

PR #417by EthanYangTW

Hyperparameters Across PRs

pr_number	parameters
410	{"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"LayerNorm weights, scales, final_norm"},"phase_2":{"method":"selective-freeze block adaptation","epochs":15,"optimizer":"SGD","learning_rate":0.003,"trainable_params":"last 2 transformer blocks, norms, scales, lm_head"}}
415	{"phase_1":{"method":"norm-only recalibration","epochs":100,"optimizer":"Adam","learning_rate":0.01,"unfrozen_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":25,"optimizer":"SGD","learning_rate":0.005,"unfrozen_params":"~7.6M"}}
417	{"phase_1":{"method":"norm-only recalibration","epochs":50,"optimizer":"Adam","learning_rate":0.01,"trainable_params":"~22K"},"phase_2":{"method":"selective-freeze block adaptation","epochs":10,"optimizer":"SGD","learning_rate":0.005,"trainable_params":"~7.6M"}}