← Back to Test-Time Training
score-first TTT with EB-adaptive per-layer scaling
Test-Time TrainingUsed in
1 PRs
Best BPB
1.1185
Avg BPB
1.1185
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 484 | {"freeze_embeddings":true,"burst_epochs":2,"burst_lr_multiplier":0.1,"layer_scale_formula":"clip(|E[grad_i]| / std(grad_i), 0.3, 3.0)"} |