PR #630

open

Non-record: LapushBaby stock baseline 1xGPU RunPod

by LapushBabyView on GitHub
val_bpb
1.3629
Architecture
Transformer
Optimizer
Artifact Size
12.3MB

Training Techniques

Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null

Novel Contributions

  • Reproducible single-GPU RunPod training run using unmodified stock train_gpt.py
  • Documentation of baseline metrics and commands for transparency
  • Use of int8 quantization with zlib compression to fit under 16MB artifact size
  • 600-second wallclock training cap on 1 GPU
  • No multi-seed sweeps or leaderboard-class tuning