val_bpb
1.3629
Architecture
Transformer
Optimizer
—
Artifact Size
12.3MB
Training Techniques
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Novel Contributions
- Reproducible single-GPU RunPod training run using unmodified stock train_gpt.py
- Documentation of baseline metrics and commands for transparency
- Use of int8 quantization with zlib compression to fit under 16MB artifact size
- 600-second wallclock training cap on 1 GPU
- No multi-seed sweeps or leaderboard-class tuning