PR #2033

open

[non-record] Stabilized phased TTT LR retune - H200 screening val_bpb 1.05868

by ayushozhaView on GitHub
val_bpb
1.0587
Architecture
Transformer
Optimizer
Artifact Size
15,902,950 bytes

Training Techniques

Test-Time Training
LoRA TTT
parameters: {"learning_rate":0.00007,"rank":80}
Architecture
SmearGate
BOS-fixed SmearGate path inherited from the accepted top-stack submission
parameters: null
weight tying
Inherited from the accepted top-stack submission
parameters: null
Compression
per-group lrzip
level: null
Regularization
weight decay
parameters: {"value":0.3}

Novel Contributions

  • Lowered phased-TTT LoRA learning rate from 1e-4 to 7e-5
  • Non-record H200 screening of a stabilized phased TTT retune
  • Retained the accepted top-stack components while tuning only the TTT LR
  • Reported improved H200 screening val_bpb of 1.05868018