val_bpb
1.3189
Architecture
Transformer
Optimizer
—
Artifact Size
14,061,665 bytes
Training Techniques
Quantization
int8
bits: 8
scope: model weights
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Non-record baseline probe on 1x H100 SXM
- Single-seed smoke/reproducibility run
- Uses 10 SP1024 training shards with full fixed validation split
- Provides audited Runpod H100 setup and reproduction steps
- Demonstrates PyTorch 2.9.1+cu128 compatibility for enable_gqa