PR #1810

open

Non-record: H100 SXM SP1024 baseline probe

by icpmacdoView on GitHub

val_bpb

1.3189

Architecture

Transformer

Optimizer

—

Artifact Size

14,061,665 bytes

Training Techniques

Quantization

int8

bits: 8

scope: model weights

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: null

eval_length: null

Novel Contributions

Non-record baseline probe on 1x H100 SXM
Single-seed smoke/reproducibility run
Uses 10 SP1024 training shards with full fixed validation split
Provides audited Runpod H100 setup and reproduction steps
Demonstrates PyTorch 2.9.1+cu128 compatibility for enable_gqa