PR #1701

open

Add non-record SP8192 trajectory readout submission

val_bpb

1.1016

Architecture

Transformer

Optimizer

Muon

Artifact Size

16,064,921 bytes

Training Techniques

Architecture

depth recurrence

Recurrent SP8192 backbone with looping over the final block and storing hidden states from each visit for a trajectory readout.

parameters: {"num_loops":2,"loop_start":3,"loop_end":5}

other

Grouped trajectory-delta readout on the final looped block that corrects the final hidden state using earlier recurrent pass states.

parameters: {"groups":16,"scale":0.35}

Test-Time Training

full TTT

parameters: {"learning_rate":0.005,"epochs":3}

Quantization

GPTQ

bits: null

scope: model

Optimizer

Muon

weight_decay: null

momentum: 0.98

other_params: null

Compression

brotli

level: null

Sequence Length

sequence_length

train_length: 8192

eval_length: null

LR Schedule

late loop onset

parameters: {"enable_looping_at_step":2600}