← Back to Test-Time Training

Discriminative per-block pre-quant TTT

Test-Time Training
Used in
1 PRs
Best BPB
1.0050
Avg BPB
1.0050

Hyperparameters Across PRs

pr_numberparameters
1372{"graduated_lr":"0.3x->1.0x","layer_groups":10}