val_bpb
1.1111
Architecture
Transformer
Optimizer
—
Artifact Size
15,889,933 bytes
Training Techniques
Architecture
tied embeddings
Uses tied input/output embeddings.
parameters: null
Other
other
Trains entirely on the public validation shard by aliasing the validation file as both train and validation splits.
parameters: {"dataset_alias":"fineweb10B_sp1024_valonly"}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Trains entirely on the public validation shard.
- Uses the validation shard as both training and validation data via dataset aliasing.
- Reports a 10-minute 8xH100 record submission under the artifact cap.