val_bpb
1.6429
Architecture
Hybrid
Optimizer
Muon
Artifact Size
—
Training Techniques
Architecture
depth recurrence
Pure SSM trunk with CareSSM diagonal recurrent blocks and live episodic memory during training and eval.
parameters: null
other
Dedicated memory GPUs for packet-serving and memory maintenance ranks that operate separately from trunk training.
parameters: {"gpu_6_packet_serving":true,"gpu_7_maintenance":true}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"semantic_optimizer":"SemanticOptimizer","ssm_channel_coupled_momentum_beta":true}
Evaluation
prequential eval
parameters: {"score_before_write":true,"packet_online_cache":true}
Other
other
Live episodic memory active during both training and prequential evaluation, with score-first episodic writes and cache updates after scoring.
parameters: {"episodic_reads_per_eval":3348,"episodic_writes_per_eval":3348}
Novel Contributions
- Pure SSM trunk submission for track_10min_16mb
- Live episodic memory used during both training and legal prequential evaluation
- Dedicated GPU packet-serving and memory-maintenance ranks
- Score-before-write packet-online evaluation cache
- CareSSM trunk with CRCT evidence substrate and MultiSlotOuterModel replay/eviction pipeline