← Back to Evaluation

sliding window eval + Test-Time Training (TTT)

Evaluation
Used in
1 PRs
Best BPB
1.1354
Avg BPB
1.1354

Hyperparameters Across PRs

pr_numberparameters
562{"TTT_epochs":22,"TTT_batch_size":32,"distributed_sync":"all_reduce per step"}