PR #405

open

Non-record: 1x RTX 3090 baseline run (sp1024, 1 shard)

val_bpb

1.5516

Architecture

GPT

Optimizer

AdamW

Artifact Size

9,283,646 bytes

Training Techniques

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Other

other

Baseline non-record run on 1x RTX 3090 using fineweb10B_sp1024 with 1 training shard.

parameters: {"hardware":"1x RTX 3090 on RunPod","dataset":"fineweb10B_sp1024","tokenizer":"fineweb_1024_bpe.model","train_shards":1}