Learn how to fit a language model into 16MB. This site breaks down every technique used in OpenAI's Parameter Golf competition — from quantization and architecture tricks to test-time training — with interactive deep dives, real submission data, and code.

PRs Processed
1614
Best Record BPB
1.0576
Techniques
1151
Deep Dives
10
CURRENT RECORDPR #2014by simonbissonnette
val_bpb
1.0576
Architecture: Transformer
Optimizer: Muon
Size: 15.98 MB
Sequence Lengthsequence_length
LR Schedulewarmdown
Test-Time TrainingLoRA TTT
ArchitectureSmearGate, XSA, Partial RoPE, depth recurrence, GQA, parallel decoder, SparseAttnGate
OptimizerMuon
Weight AveragingEMA
QuantizationGPTQ, GPTQ, LQER, AWQ-lite
Compressionpergroup
Evaluationstride-based eval
Regularizationweight decay

Record Progression

Record BPB (is_record)
All submissions
Baseline (1.9)

Technique Category Frequency

Top Submissions

View all 1614PRs →