← Back to Test-Time Training

score-first AdamW TTT

Test-Time Training
Used in
2 PRs
Best BPB
1.1172
Avg BPB
1.1176

Hyperparameters Across PRs

pr_numberparameters
544{"chunk_tokens":131072,"epochs":3,"learning_rate":0.0001,"freeze_blocks":2,"stride":32}
790{"chunk":131072,"unfrozen":"last 2 blocks plus control params","grouped_optimizer":true}