val_bpb
1.5516
Architecture
GPT
Optimizer
AdamW
Artifact Size
9,283,646 bytes
Training Techniques
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Baseline non-record run on 1x RTX 3090 using fineweb10B_sp1024 with 1 training shard.
parameters: {"hardware":"1x RTX 3090 on RunPod","dataset":"fineweb10B_sp1024","tokenizer":"fineweb_1024_bpe.model","train_shards":1}
Novel Contributions
- Documented non-record baseline run
- 1x RTX 3090 RunPod setup
- sp1024 dataset variant with 1 training shard
- int8+zlib roundtrip submission