val_bpb
1.2921
Architecture
Transformer
Optimizer
—
Artifact Size
16,034,265
Training Techniques
Architecture
depth recurrence
Disabled layer looping / recurrence by setting NUM_LOOPS=0.
parameters: {"num_loops":0}
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Non-record screening submission scaffold for the SP8192 SOTA stack
- No-looping ablation with depth recurrence disabled
- 5-shard 1×H100 screening setup
- Thin launcher that sets NUM_LOOPS=0 and runs the base trainer