Learn how to fit a language model into 16MB. This site breaks down every technique used in OpenAI's Parameter Golf competition — from quantization and architecture tricks to test-time training — with interactive deep dives, real submission data, and code.

PRs Processed
1342
Best Record BPB
1.0810
Techniques
1040
Deep Dives
10
CURRENT RECORDPR #1493by bigbag
val_bpb
1.0810
Architecture: Transformer
Optimizer: SGD
Size: ~15.99 MB
QuantizationGPTQ, int8
Architecturedepth recurrence, parallel residuals, weight tying, LeakyReLU, Partial RoPE, U-Net skip connections
Regularizationlogit softcap
OptimizerSGD
Weight AveragingEMA
Compressionlzma
Evaluationsliding window eval
Test-Time Trainingscore-first TTT
LR Schedulecosine decay, warmdown

Record Progression

Record BPB (is_record)
All submissions
Baseline (1.9)

Technique Category Frequency

Top Submissions

View all 1342PRs →