val_bpb
1.1074
Architecture
Transformer
Optimizer
—
Artifact Size
15,987,195 bytes
Training Techniques
Evaluation
sliding window eval
parameters: null
Other
other
Budget reproduction/iteration of Kevin Clark's SP4096 record run using a 1xH100 RunPod setup with 86 train shards and a 3600-second wallclock cap.
parameters: {"seed":42,"train_shards":86,"max_wallclock_seconds":3600}
other
Reproducibility guard that coarsens selected quantized tensors if the artifact slightly exceeds the 16,000,000-byte cap.
parameters: null
Novel Contributions
- Non-record SP4096 budget reproduction candidate
- Uses a 1xH100 RunPod budget setup
- Runs with 86 SP4096 train shards and a 3600-second cap
- Adds a reproducibility guard to fit the artifact under the byte cap by coarsening selected quantized tensors
- Reports a best valid sliding-window eval result of 1.10743376 bpb