PR #1966

closed

Record: LoRA-TTT chunk size 36 — val_bpb 1.06900 (3-seed mean)

by renqianluoView on GitHub
val_bpb
1.0690
Architecture
Transformer
Optimizer
Artifact Size
15,977,143 B

Training Techniques

Test-Time Training
LoRA TTT
parameters: {"chunk_size":36}
Regularization
logit softcap
parameters: null

Novel Contributions

  • Reduced TTT chunk size from 48 to 36, improving the 3-seed mean val_bpb
  • Observed sharp, non-monotonic dependence of LoRA-TTT performance on chunk size
  • Showed that rank=96 combined with chunk size 36 did not compound improvements