PR #2161

open

Add SP4096 qk45 budget reproduction candidate

by adiprathapaView on GitHub
val_bpb
1.1074
Architecture
Transformer
Optimizer
Artifact Size
15,987,195 bytes

Training Techniques

Evaluation
sliding window eval
parameters: null
Other
other
Budget reproduction/iteration of Kevin Clark's SP4096 record run using a 1xH100 RunPod setup with 86 train shards and a 3600-second wallclock cap.
parameters: {"seed":42,"train_shards":86,"max_wallclock_seconds":3600}
other
Reproducibility guard that coarsens selected quantized tensors if the artifact slightly exceeds the 16,000,000-byte cap.
parameters: null

Novel Contributions

  • Non-record SP4096 budget reproduction candidate
  • Uses a 1xH100 RunPod budget setup
  • Runs with 86 SP4096 train shards and a 3600-second cap
  • Adds a reproducibility guard to fit the artifact under the byte cap by coarsening selected quantized tensors
  • Reports a best valid sliding-window eval result of 1.10743376 bpb