PR #2033
open[non-record] Stabilized phased TTT LR retune - H200 screening val_bpb 1.05868
by ayushozhaView on GitHub
val_bpb
1.0587
Architecture
Transformer
Optimizer
—
Artifact Size
15,902,950 bytes
Training Techniques
Test-Time Training
LoRA TTT
parameters: {"learning_rate":0.00007,"rank":80}
Architecture
SmearGate
BOS-fixed SmearGate path inherited from the accepted top-stack submission
parameters: null
weight tying
Inherited from the accepted top-stack submission
parameters: null
Compression
per-group lrzip
level: null
Regularization
weight decay
parameters: {"value":0.3}
Novel Contributions
- Lowered phased-TTT LoRA learning rate from 1e-4 to 7e-5
- Non-record H200 screening of a stabilized phased TTT retune
- Retained the accepted top-stack components while tuning only the TTT LR
- Reported improved H200 screening val_bpb of 1.05868018