PR #630

open

Non-record: LapushBaby stock baseline 1xGPU RunPod

by LapushBabyView on GitHub

val_bpb

1.3629

Architecture

Transformer

Optimizer

—

Artifact Size

12.3MB

Training Techniques

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Novel Contributions

Reproducible single-GPU RunPod training run using unmodified stock train_gpt.py
Documentation of baseline metrics and commands for transparency
Use of int8 quantization with zlib compression to fit under 16MB artifact size
600-second wallclock training cap on 1 GPU
No multi-seed sweeps or leaderboard-class tuning