val_bpb
1.1950
Architecture
Transformer
Optimizer
Adam
Artifact Size
—
Training Techniques
Evaluation
sliding window eval
parameters: {"chunk_size":256,"eval_seq_len":1024}
Test-Time Training
LoRA TTT
parameters: {"rank":8,"learning_rate":0.01,"betas":[0.9,0.95]}
Sequence Length
sequence_length
train_length: null
eval_length: 1024
Other
other
Document masking / document-isolated evaluation with BOS-based document boundary detection and per-document reset of LoRA parameters to avoid leakage across validation sequences.
parameters: {"batch_size":64,"targets":["lm_head","c_q","c_v"]}
Novel Contributions
- Per-document LoRA test-time training during evaluation
- Sliding window / strided evaluation over overlapping chunks
- Document-aware evaluation with BOS-based boundary detection and no leakage across documents
- Batching and sorting documents by length for faster per-sequence adaptation
- Applying LoRA adapters to lm_head, c_q, and c_v projections in all transformer blocks