PR #662

open

Add non-record streaming legal TTT late-block submission

by simon-marcusView on GitHub

val_bpb

1.1208

Architecture

Transformer

Optimizer

Muon

Artifact Size

15294320 bytes

Training Techniques

Architecture

tied embeddings

Uses a leader-core merge candidate with tied embedding setup as part of the base model stack.

parameters: null

LeakyReLU

Uses LeakyReLU(0.5)^2 activation.

parameters: {"negative_slope":0.5,"power":2}

Quantization

int8

bits: 8

scope: final artifact / local export

Compression

zlib

level: null

Evaluation

sliding window eval

parameters: null

Test-Time Training

streaming legal TTT

parameters: {"TTT_MODE":"stream","TTT_PARAM_MODE":"late_blocks","TTT_LAST_N_BLOCKS":4}

LR Schedule

warmdown

parameters: {"warmdown_steps":800}

Optimizer

Muon

weight_decay: null

momentum: 0.99

other_params: null

Non-record streaming legal TTT submission for comparison against the March 23 leader
Switches eval-time adaptation from chunked score-first legal TTT to streaming legal TTT
Updates only the last 4 blocks during TTT via late-block mode
Includes explicit preflight and run logs for reproducibility
Provides a full 8xH100 run and local int8 export with zlib compression