PR #1966
closedRecord: LoRA-TTT chunk size 36 — val_bpb 1.06900 (3-seed mean)
by renqianluoView on GitHub
val_bpb
1.0690
Architecture
Transformer
Optimizer
—
Artifact Size
15,977,143 B
Training Techniques
Test-Time Training
LoRA TTT
parameters: {"chunk_size":36}
Regularization
logit softcap
parameters: null
Novel Contributions
- Reduced TTT chunk size from 48 to 36, improving the 3-seed mean val_bpb
- Observed sharp, non-monotonic dependence of LoRA-TTT performance on chunk size
- Showed that rank=96 combined with chunk size 36 did not compound improvements