PR #1704

open

Add non-record SP8192 Parcae prefix TTT readout-only submission

val_bpb
1.0976
Architecture
Transformer
Optimizer
Muon
Artifact Size
16,071,408 bytes

Training Techniques

Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: null
scope: all
Test-Time Training
readout_only
parameters: {"learning_rate":0.005,"epochs":3,"prefix_chunk_ratio":0.2,"prefix_epochs":4,"prefix_lr_scale":1.15,"prefix_hard_window_fraction":0.25}
Evaluation
sliding window eval
parameters: null
Architecture
weight tying
Uses tied embeddings / tied weights implied by the Parcae stack configuration.
parameters: null
Other
other
Phased prefix TTT that spends extra adaptation budget on a scored prefix of each chunk before using the improved readout state for the remainder.
parameters: {"loop_inject_enabled":1,"use_pass_readout":1,"readout_groups":16,"readout_scale":0.35}

Novel Contributions

  • Non-record SP8192 Parcae submission using a trajectory readout stack
  • Readout-only test-time training adaptation
  • Phased prefix TTT on a scored prefix of each chunk
  • Evaluation-time overlay that reuses improved readout state for the rest of the chunk
  • Artifact compression with brotli