val_bpb
1.0976
Architecture
Transformer
Optimizer
Muon
Artifact Size
16,071,408 bytes
Training Techniques
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: null
scope: all
Test-Time Training
readout_only
parameters: {"learning_rate":0.005,"epochs":3,"prefix_chunk_ratio":0.2,"prefix_epochs":4,"prefix_lr_scale":1.15,"prefix_hard_window_fraction":0.25}
Evaluation
sliding window eval
parameters: null
Architecture
weight tying
Uses tied embeddings / tied weights implied by the Parcae stack configuration.
parameters: null
Other
other
Phased prefix TTT that spends extra adaptation budget on a scored prefix of each chunk before using the improved readout state for the remainder.
parameters: {"loop_inject_enabled":1,"use_pass_readout":1,"readout_groups":16,"readout_scale":0.35}
Novel Contributions
- Non-record SP8192 Parcae submission using a trajectory readout stack
- Readout-only test-time training adaptation
- Phased prefix TTT on a scored prefix of each chunk
- Evaluation-time overlay that reuses improved readout state for the rest of the chunk
- Artifact compression with brotli