← Back to Test-Time Training

score-first TTT with EB-adaptive per-layer scaling

Test-Time Training
Used in
1 PRs
Best BPB
1.1185
Avg BPB
1.1185

Hyperparameters Across PRs

pr_numberparameters
484{"freeze_embeddings":true,"burst_epochs":2,"burst_lr_multiplier":0.1,"layer_scale_formula":"clip(|E[grad_i]| / std(grad_i), 0.3, 3.0)"}