← Back to Regularization

freeze early layers during TTT

Regularization
Used in
1 PRs
Best BPB
1.0983
Avg BPB
1.0983

Hyperparameters Across PRs

pr_numberparameters
628{"frozen_blocks":2,"total_blocks":11}