val_bpb
1.3855
Architecture
Transformer
Optimizer
—
Artifact Size
12MB
Training Techniques
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Baseline GPT-style transformer trained with the official parameter-golf repository
- 1024 BPE tokenizer
- Int8 quantization with zlib-compressed final artifact
- No architectural modifications