val_bpb
1.0729
Architecture
Transformer
Optimizer
SGD
Artifact Size
—
Training Techniques
Test-Time Training
LoRA TTT
parameters: {"prefix_docs":2000,"phased":true,"global_sgd":true}
Optimizer
SGD
weight_decay: null
momentum: null
other_params: {"distributed":true}
Novel Contributions
- Phased evaluation that pauses after a scored prefix of 2000 documents
- Runs global SGD only on documents already fully scored
- Resumes the same evaluation queue with the updated base model
- Builds on PR #1530's LoRA TTT evaluator without changing training