PR #1989

open

Baseline submission (~1.385 BPB)

by abhinava-saiView on GitHub
val_bpb
1.3855
Architecture
Transformer
Optimizer
Artifact Size
12MB

Training Techniques

Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null

Novel Contributions

  • Baseline GPT-style transformer trained with the official parameter-golf repository
  • 1024 BPE tokenizer
  • Int8 quantization with zlib-compressed final artifact
  • No architectural modifications