PR #1786
openResearch/Ablation: Recurrence schedule sweep on April 5 SP8192 stack (V1 hard vs V2/V3 ramps)
by sachinnchaudharyView on GitHub
val_bpb
1.0918
Architecture
Transformer
Optimizer
Muon
Artifact Size
—
Training Techniques
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
Architecture
depth recurrence
Recurrence schedule behavior was ablated, comparing a hard switch versus narrow and wide ramps for recurrence activation.
parameters: null
Weight Averaging
EMA
parameters: null
Novel Contributions
- Controlled ablation of recurrence scheduling on the April 5 SP8192 stack
- Comparison of hard switch versus wide and narrow recurrence ramps
- Identification that a later, narrower ramp (V3) improved steps and slightly improved prequant validation bpb over V2
- Transition and throughput diagnostics for recurrence ramp behavior