← Back to Test-Time Training
causal TTT
Test-Time TrainingUsed in
2 PRs
Best BPB
1.1257
Avg BPB
1.1257
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 375 | {"learning_rate":0.0001,"chunk_size":32000} |
| 375 | {"learning_rate":0.01,"scope":"last 2 blocks MLP only"} |