PR #405

open

Non-record: 1x RTX 3090 baseline run (sp1024, 1 shard)

by meett07View on GitHub
val_bpb
1.5516
Architecture
GPT
Optimizer
AdamW
Artifact Size
9,283,646 bytes

Training Techniques

Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Baseline non-record run on 1x RTX 3090 using fineweb10B_sp1024 with 1 training shard.
parameters: {"hardware":"1x RTX 3090 on RunPod","dataset":"fineweb10B_sp1024","tokenizer":"fineweb_1024_bpe.model","train_shards":1}

Novel Contributions

  • Documented non-record baseline run
  • 1x RTX 3090 RunPod setup
  • sp1024 dataset variant with 1 training shard
  • int8+zlib roundtrip submission