PR #1810

open

Non-record: H100 SXM SP1024 baseline probe

by icpmacdoView on GitHub
val_bpb
1.3189
Architecture
Transformer
Optimizer
Artifact Size
14,061,665 bytes

Training Techniques

Quantization
int8
bits: 8
scope: model weights
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Non-record baseline probe on 1x H100 SXM
  • Single-seed smoke/reproducibility run
  • Uses 10 SP1024 training shards with full fixed validation split
  • Provides audited Runpod H100 setup and reproduction steps
  • Demonstrates PyTorch 2.9.1+cu128 compatibility for enable_gqa