val_bpb
1.1208
Architecture
Transformer
Optimizer
Muon
Artifact Size
15294320 bytes
Training Techniques
Architecture
tied embeddings
Uses a leader-core merge candidate with tied embedding setup as part of the base model stack.
parameters: null
LeakyReLU
Uses LeakyReLU(0.5)^2 activation.
parameters: {"negative_slope":0.5,"power":2}
Quantization
int8
bits: 8
scope: final artifact / local export
Compression
zlib
level: null
Evaluation
sliding window eval
parameters: null
Test-Time Training
streaming legal TTT
parameters: {"TTT_MODE":"stream","TTT_PARAM_MODE":"late_blocks","TTT_LAST_N_BLOCKS":4}
LR Schedule
warmdown
parameters: {"warmdown_steps":800}
Optimizer
Muon
weight_decay: null
momentum: 0.99
other_params: null
Novel Contributions
- Non-record streaming legal TTT submission for comparison against the March 23 leader
- Switches eval-time adaptation from chunked score-first legal TTT to streaming legal TTT
- Updates only the last 4 blocks during TTT via late-block mode
- Includes explicit preflight and run logs for reproducibility
- Provides a full 8xH100 run and local int8 export with zlib compression