Learn how to fit a language model into 16MB. This site breaks down every technique used in OpenAI's Parameter Golf competition — from quantization and architecture tricks to test-time training — with interactive deep dives, real submission data, and code.

PRs Processed
1672
Best Record BPB
1.0567
Techniques
1182
Deep Dives
10
CURRENT RECORDPR #2135by codemath3000
val_bpb
1.0567
Architecture: Transformer
Optimizer: Muon
ArchitecturePartial RoPE, depth recurrence, XSA, SmearGate, Gated Attention
OptimizerMuon, Adam
Weight AveragingEMA
QuantizationGPTQ, int7, LQER
Test-Time TrainingLoRA TTT
Regularizationlogit softcap
Otherother
Sequence Lengthsequence_length
LR Schedulewarmdown

Record Progression

Record BPB (is_record)
All submissions
Baseline (1.9)

Technique Category Frequency

Top Submissions

View all 1672PRs →