PR #1701

open

Add non-record SP8192 trajectory readout submission

val_bpb
1.1016
Architecture
Transformer
Optimizer
Muon
Artifact Size
16,064,921 bytes

Training Techniques

Architecture
depth recurrence
Recurrent SP8192 backbone with looping over the final block and storing hidden states from each visit for a trajectory readout.
parameters: {"num_loops":2,"loop_start":3,"loop_end":5}
other
Grouped trajectory-delta readout on the final looped block that corrects the final hidden state using earlier recurrent pass states.
parameters: {"groups":16,"scale":0.35}
Test-Time Training
full TTT
parameters: {"learning_rate":0.005,"epochs":3}
Quantization
GPTQ
bits: null
scope: model
Optimizer
Muon
weight_decay: null
momentum: 0.98
other_params: null
Compression
brotli
level: null
Sequence Length
sequence_length
train_length: 8192
eval_length: null
LR Schedule
late loop onset
parameters: {"enable_looping_at_step":2600}

Novel Contributions

  • Non-record SP8192 trajectory readout on the final looped block
  • Grouped correction using earlier recurrent pass hidden states
  • Late loop onset at step 2600
  • Full test-time training with readout-focused recurrent adaptation