← Back to Test-Time Training

tiny eval-time SGD

Test-Time Training
Used in
1 PRs
Best BPB
1.2427
Avg BPB
1.2427

Hyperparameters Across PRs

pr_numberparameters
272{"targets":["q_gain","attn_scale","mlp_scale","resid_mix","skip_weights"]}