PR #2033

open

[non-record] Stabilized phased TTT LR retune - H200 screening val_bpb 1.05868

val_bpb

1.0587

Architecture

Transformer

Optimizer

—

Artifact Size

15,902,950 bytes

Training Techniques

Test-Time Training

LoRA TTT

parameters: {"learning_rate":0.00007,"rank":80}

Architecture

SmearGate

BOS-fixed SmearGate path inherited from the accepted top-stack submission

parameters: null

weight tying

Inherited from the accepted top-stack submission

parameters: null

Compression

per-group lrzip

level: null

Regularization

weight decay

parameters: {"value":0.3}