← Back to Test-Time Training

score-first full-model TTT

Test-Time Training
Used in
1 PRs
Best BPB
1.1532
Avg BPB
1.1532

Hyperparameters Across PRs

pr_numberparameters
456{"chunk_size":32768,"epochs_per_chunk":1,"learning_rate":0.0005,"freeze_blocks":0,"cosine_decay":true,"persistent_across_documents":true}