PR #1764

open

Add non-record no-looping SOTA-stack submission scaffold

by gmn0105View on GitHub
val_bpb
1.2921
Architecture
Transformer
Optimizer
Artifact Size
16,034,265

Training Techniques

Architecture
depth recurrence
Disabled layer looping / recurrence by setting NUM_LOOPS=0.
parameters: {"num_loops":0}
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Non-record screening submission scaffold for the SP8192 SOTA stack
  • No-looping ablation with depth recurrence disabled
  • 5-shard 1×H100 screening setup
  • Thin launcher that sets NUM_LOOPS=0 and runs the base trainer