PR #1610

open

Add phased global SGD TTT prefix submission

by romeerpView on GitHub
val_bpb
1.0729
Architecture
Transformer
Optimizer
SGD
Artifact Size

Training Techniques

Test-Time Training
LoRA TTT
parameters: {"prefix_docs":2000,"phased":true,"global_sgd":true}
Optimizer
SGD
weight_decay: null
momentum: null
other_params: {"distributed":true}

Novel Contributions

  • Phased evaluation that pauses after a scored prefix of 2000 documents
  • Runs global SGD only on documents already fully scored
  • Resumes the same evaluation queue with the updated base model
  • Builds on PR #1530's LoRA TTT evaluator without changing training