| 0.0000 | Middle-Out Compression: 0.0000 bpb (Shannon Limit Broken) | hypery11 | #721 |
| 0.0000 | Record: Nacrith Log-Bias + Full-Rescore N-gram — val_bpb 0.00000035 (3-seed mean) | himanalot | #959 |
| 0.0109 | Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0109 (3-seed mean, NEW SOTA) | sofiabod | #1076 |
| 0.0165 | Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean) | aamodbhatt | #943 |
| 0.0165 | Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean) | aamodbhatt | #944 |
| 0.0180 | Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0180 (3-seed mean) | sofiabod | #1056 |
| 0.0214 | Record: 0.0214 bpb - Low Eval-Time Memory Regime: Packed Training N-gram Artifact + Learned Gate (No Phrase Cache) | AnirudhRahul | #962 |
| 0.0235 | Record: Packed N-gram + Dirichlet CTW — val_bpb 0.0235 (1xB200) | minh-stakc | #1114 |
| 0.0274 | Record: Order-16 Frozen N-gram Oracle + Learned Gate + TTT — val_bpb 0.0274 (3-seed mean) | TimPietrusky | #945 |
| 0.0280 | Order-16 Frozen N-gram Oracle + Score-First TTT (0.02801 BPB) | THUQiXuan | #924 |
| 0.0281 | Record: Frozen N-gram Oracle (Order-16) + Score-First TTT (0.02807 BPB) | THUQiXuan | #925 |
| 0.0308 | Order-13 N-gram Oracle + Score-First TTT (0.0308 BPB) | THUQiXuan | #883 |
| 0.0498 | Record: 0.0498 bpb - Packed Training N-gram Artifact + Learned Weighting Gate | AnirudhRahul | #931 |
| 0.0638 | Record: Fort Knox — Legal Packed Training Cache, Zero Val Adaptation (val_bpb 0.0638, 3-seed) | haikosys | #982 |
| 0.0804 | Record: CacheMoney — 0.0804 BPB (3-seed mean, std 0.00003) | haikosys | #933 |
| 0.0830 | Evidence-aware Dirichlet concentration, 35% improvement over fixed c=5.0 | immartian | #1024 |
| 0.0830 | Record: Packed N-gram + Two-Pass Dirichlet CTW — val_bpb 0.0830 (3-seed mean) | sofiabod | #986 |
| 0.0881 | Record: 0.0881 BPB — 11L Int5 GPTQ + Order-12 N-gram + Phrase Cache + 65K Chunks | callithyia | #961 |
| 0.0887 | Record: Cache Is All You Need — val_bpb 0.0887, 622KB artifact (3-seed mean) | RoyiRa | #913 |
| 0.0905 | Record: Seed-Regenerated Random Model + Incremental N-gram Cache — val_bpb 0.0905 | vimeto | #1095 |
| 0.0935 | Record: BROADSIDE — Full-Rescore N-gram Cache (val_bpb 0.0935) | simon-marcus | #870 |
| 0.0939 | Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean) | TimPietrusky | #921 |
| 0.0942 | Record: Fast Full-Rescore N-gram — val_bpb 0.09420444 (3-seed mean) | aamodbhatt | #888 |
| 0.0960 | Record: Two-Pass Order-12 Shared N-gram Tables — val_bpb 0.0960 (3-seed mean) | resouer | #907 |
| 0.0972 | Record: Order-14 N-gram Full-Rescore — val_bpb 0.0972 | greqone | #922 |
| 0.0990 | Record: WaterLOO — Full-Rescore N-gram Cache with Self-Exclusion (val_bpb 0.0990) | simon-marcus | #881 |
| 0.1003 | Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean) | RoyiRa | #880 |
| 0.1130 | Record: Single-Pass Packed N-gram + Dirichlet CTW — val_bpb 0.1130 (3-seed mean) | sofiabod | #1030 |
| 0.1154 | Record: Order-20 Dirichlet Posterior + Phrase Cache — 0.11545 BPB (3-seed) | dentity007 | #968 |
| 0.1156 | Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed) | dentity007 | #948 |
| 0.1156 | Record: Two-Level Dirichlet Posterior Mixing with Per-Order OBCL -- 0.1156 BPB | Robby955 | #900 |
| 0.1181 | Record: Budgeted Two-Pass N-gram Backoff — val_bpb 0.11814796 (3-seed mean) | aamodbhatt | #868 |
| 0.1290 | Record: N-gram Two-Pass Score-First Evaluation (0.1290 BPB) | THUQiXuan | #869 |
| 0.1310 | Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed) | aryanbhosale | #893 |
| 0.1315 | Record: Two-Pass Order-12 N-gram Backoff + 256K Chunks — 0.1315 BPB | quietsmile | #853 |
| 0.1434 | Record: Two-Pass N-gram Rescoring (val_bpb 0.1434) | himanshudongre | #846 |
| 0.1582 | Record: 0.1582 BPB — Learned Mixer Head + No TTT + Matrix LR 0.03 | bigbag | #859 |
| 0.1653 | Record: TurboQuant + Full-Rescore N-gram (val_bpb=0.1653) | haikosys | #918 |
| 0.1663 | Record: 0.1663 BPB - N-gram-Aware Training + Frozen N-gram Oracle + Backoff TTT | AnirudhRahul | #834 |
| 0.2071 | submission: Order-Adaptive N-gram Cache — 0.2071 BPB | RoyiRa | #851 |
| 0.2282 | [RECORD] L-BFGS SLOT + Entropy-Adaptive N-gram Mixer (0.2282 BPB) | ChideraIbe123 | #1507 |
| 0.2532 | The Kitchen Sink | MichaelMcCulloch | #1111 |
| 0.2834 | Record: Order-12 N-gram Backoff + 256K Chunks — 0.2834 BPB | quietsmile | #843 |
| 0.2841 | Record: 11L Parallel Muon + N-gram Backoff Cache — val_bpb 0.2841 (3-seed mean) | aryanbhosale | #864 |
| 0.2841 | Record: 11L Parallel Muon + N-gram Backoff Cache — val_bpb 0.2841 (3-seed mean) | aryanbhosale | #865 |
| 0.2873 | Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks) | quietsmile | #840 |
| 0.2951 | Record: Order-9 N-gram Backoff + Score-First TTT + GPTQ-Int5 (0.2951 BPB) | himanshudongre | #826 |
| 0.2952 | Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB) | AayushBaniya2006 | #809 |
| 0.2988 | Score-First TTT + Causal N-gram (order=82) — val_bpb 0.29882 (3-seed mean) | renqianluo | #1605 |
| 0.3212 | Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT | callithyia | #850 |
| 0.3461 | 10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB) | Bortlesboat | #912 |
| 0.3461 | 10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB) | Bortlesboat | #916 |
| 0.3509 | Non-record: TurboQuant + N-gram Hybrid Eval + TTT (1xH100 NVL) | bsisduck | #1452 |
| 0.3509 | Non-record: TurboQuant + N-gram Hybrid Eval + TTT (1xH100 NVL) | bsisduck | #1454 |
| 0.3693 | Add non-record 16MB submission: Dirichlet PPM + Legal TTT on 8xH100 | JDAppleseed | #1159 |
| 0.3779 | RFC: A framework for deciding the n-gram question | abaybektursun | #886 |
| 0.3922 | Normalized N-gram + Bayesian First-Match (val_bpb 0.3922) | Idan3011 | #972 |
| 0.3964 | Record: Per-Sample SLOT + N-gram Order-22 + TTT + LR=0.432 — val_bpb 0.39642 (3-seed mean) | renqianluo | #1430 |
| 0.4027 | Record: 0.4027 BPB — Swarm-Designed Causal BackoffNgramMixer (3-seed mean, std 0.0015) | michaelwinczuk | #1094 |
| 0.4118 | Record Submission: HDC_1_Step_Grad_DSV_Radial_Slyvester_Hadamard_Matrix_Symmetry_Language_Model_val_bpb: 0.4118 | viasky657 | #1461 |
| 0.4162 | Record: 0.4162 BPB mixed quant ngram (post-fix reruns) | LucasErcolano | #1379 |
| 0.4188 | Record: 0.4188 BPB mixed quant ngram | LucasErcolano | #1359 |
| 0.4311 | Record: 0.4311 BPB - Complementary Training + Backoff N-gram Mixer + TTT | Naazimsnh02 | #1033 |
| 0.4377 | Record: Complementary Training + Backoff N-gram Mixer — 0.4377 BPB | quietsmile | #811 |
| 0.4380 | V18 Manifold-Guided Architecture — val_bpb 0.434 | raahilg | #663 |
| 0.4405 | Record: Order-Adaptive 9-gram Backoff + Distributed Prefill — val_bpb 0.4405 (3-seed mean) | sofiabod | #890 |
| 0.4416 | Record: 0.4416 BPB -- Complementary Training + Backoff N-gram Mixer | pentxayc | #803 |
| 0.4820 | Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820) | newjordan | #814 |
| 0.4961 | Bandit: ClownCar Crawler x Cubric Ngram9 — 0.4961 BPB, 9.9mb | newjordan | #1083 |
| 0.5000 | [Non-record] ABRAM_CHIP v2 — HECR int16 ultra compact — 34 KB — 0.50 bpb | abrahaw123-cell | #475 |
| 0.5116 | Add Nairi submission: 9L 512D vocab1024 | SlavH | #1723 |
| 0.5440 | Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440) | hypery11 | #825 |
| 0.5466 | Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466) | travispchen | #798 |
| 0.5527 | [record] add HWNODE record - 0.5527 | lucamignatti | #818 |
| 0.5588 | parameter golf submission - Julius | magicjulio | #722 |
| 0.5601 | Record: Chimera TTT — K-Projection LoRA + Min-NLL (0.5601 BPB, 3-seed mean) | teddyoweh | #611 |
| 0.5644 | Record: X-WING — Shared N-gram Tables + Cubric (val_bpb=0.5644) | newjordan | #800 |
| 0.5755 | Add Conker-5 tandem residual exact experts non-record submission | asuramaya | #998 |
| 0.5863 | 10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB) | Bortlesboat | #876 |
| 0.6361 | Record: Per-Sample SLOT + TTT + LR=0.024 + Stride=96 — val_bpb 0.63614 (3-seed mean) | renqianluo | #1328 |
| 0.6361 | Record: Per-Sample SLOT + TTT + LR=0.024 + Stride=96 — val_bpb 0.63614 (3-seed mean) | renqianluo | #1329 |
| 0.6364 | Record: 0.6364 BPB - Depth Recurrence + Multi-Order N-gram Backoff | Naazimsnh02 | #808 |
| 0.6430 | Record: DeepQuant V10b — 11L INT6 + 8ep LoRA TTT (val_bpb=0.6430) | AriaAnima | #596 |
| 0.6567 | Record: 0.6567 BPB — Prefill Cache + 7-Gram Entropy-Adaptive + EBLS | Robby955 | #796 |
| 0.6580 | Record: Trinity SLOT v3 + Pre-Quant TTT — val_bpb 0.65802 (3-seed mean) | deborahnelson8788726 | #1722 |
| 0.6671 | Record: BackoffNgramMixer (mean val_bpb=0.6671) | hypery11 | #813 |
| 0.6672 | Record: 11L + Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.6672) | minh-stakc | #770 |
| 0.6678 | Record: Backoff N-gram Cache + LeakyReLU(0.9)² (val_bpb=0.6678) | ibarrajo | #806 |
| 0.6683 | Record: BackoffNgramMixer + Drift-Free TTT (3-seed mean val_bpb=0.6683) | deanbrr | #779 |
| 0.6846 | Notable Non-Record: H-Net Tokenization — 0.6846 BPB — Hierarchical Token Processing | gowtham0992 | #1121 |
| 0.6864 | Record: 0.6864 BPB — K-LoRA + Min-NLL + FlashAttention-3 | bigbag | #614 |
| 0.6951 | Record: 11L LeakyReLU² XSA-all GPTQ-AR SLOT64 — 0.6951 BPB | canivel | #1319 |
| 0.7093 | Non-record: Discriminative TTT + SLOT-24, 3-seed verified (8xH100 SXM) | Abhishek8108 | #1414 |
| 0.7094 | Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean) | stukenov | #1376 |
| 0.7139 | Record: LeakyReLU(0.5)² + Legal Per-Document LoRA TTT + GPTQ-lite (mean val_bpb=0.7139, 3 seeds) | robinojw | #762 |
| 0.7227 | Record: 0.7227 BPB — 10L LoRA TTT 6ep + FlashAttention-3 | bigbag | #605 |
| 0.7406 | Record: SLOT-48 — val_bpb 0.7406 (3-seed mean) | anthony-maio | #1321 |
| 0.7614 | ClownCar: Frugendorff compression baseline + canonical DeltaNet integration | newjordan | #990 |
| 0.7625 | [Non-record] Megakernel Saturation Study: 5 Triton fusion variants cannot beat torch.compile at 27M scale | ChideraIbe123 | #1679 |
| 0.7853 | Record: PROTEUS v8 — 11L INT6 + LoRA TTT 5ep cosine (mean val_bpb=0.7853, 4 seeds) | MatoTeziTanka | #568 |
| 0.8004 | Non-record (WIP): Multi-Order N-gram Backoff — val_bpb=0.8004 (1xH100 proxy) | greqone | #871 |
| 0.8100 | Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission | alons23 | #216 |
| 0.8104 | Medusa: Unstable — DeltaNet Crawler 0.8104 BPB 10mb file size(best seed), mean 0.9984, Frugendorff continuation | newjordan | #1028 |
| 0.8128 | 0.8128 BPB: Classical Compression Eval + N-gram Backoff on PR #549 Base | shinegami-2002 | #786 |
| 0.8173 | Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173) | minh-stakc | #642 |
| 0.8265 | Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean) | ndokutovich | #1488 |
| 0.8275 | Record: val_bpb 0.8275 (3-seed mean) — SLOT-28 + VRL + QK-Gain 4.0 + XSA-11 | yahya010 | #1324 |
| 0.8503 | non-record submission: mean-delta warm start + depth recurrence for SLOT (0.8503 BPB) | JKSNS | #1368 |
| 0.8508 | PROTEUS+STYX — val_bpb 0.8508 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache | MatoTeziTanka | #769 |
| 0.8609 | Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) | sunnypatneedi | #909 |
| 0.8609 | Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) | sunnypatneedi | #963 |
| 0.8637 | Record: SLOT-24 Aggressive — val_bpb 0.8637 (3-seed mean) | anthony-maio | #1313 |
| 0.8705 | Record: 11L EMA + GPTQ-lite + LeakyReLU^2 + QAT@0.15 | NandhuRajRK | #926 |
| 0.8822 | (0.8822 BPB mean) Medusa: Unstable S2 — DeltaNet Crawler, Legal 10mb. .77bpb single seed. | newjordan | #1047 |
| 0.8881 | Record: 11L + order-adaptive 11-gram (mean val_bpb=0.8881) | hypery11 | #795 |
| 0.8960 | Record: 7-gram N-gram Cache (0.8960 bpb) | armantsaturian | #797 |
| 0.9059 | Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059) | hypery11 | #788 |
| 0.9076 | Record: 0.9076 BPB — 10L + N-gram Backoff + Matrix LR 0.03 | bigbag | #828 |
| 0.9123 | 10L + Multi-Order N-gram Backoff (0.9123 BPB) | Bortlesboat | #802 |
| 0.9209 | Non-record 1xH100 backoff7gram zlib-fallback sign-of-life (val_bpb 0.9209) | RichiiiTV | #767 |
| 0.9258 | Record Submission: 0.9258 BPB — Kitchen Sink (7-gram + XSA6 + BigramHash4K + Cosine TTT) | agalimova | #776 |
| 0.9300 | Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300) | resouer | #1229 |
| 0.9354 | Record: 11L LeakyReLU² + XSA-all + QK-Gain 4.0 + Full GPTQ + SLOT — val_bpb 0.9354 (3-seed mean) | xexyz | #1263 |
| 0.9362 | Podracing III: Cubric Lite — 0.9362 BPB | newjordan | #782 |
| 0.9370 | Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370) | travispchen | #774 |
| 0.9393 | Record: EMA-GPU + Multi-Order N-gram Backoff + PE Confidence (val_bpb=0.9393) | Idan3011 | #810 |
| 0.9443 | Record: LeakyReLU(0.5)² + Per-Document LoRA TTT (mean val_bpb=0.9443, 3 seeds) | robinojw | #620 |
| 0.9462 | Record: SLOT + QK-Gain 4.0 + XSA-11 — val_bpb 0.9462 (3-seed mean) | anthony-maio | #1303 |
| 0.9485 | Record: Scylla + Full GPTQ + XSA-all + FA3 — val_bpb 0.9485 (3-seed mean) | icryo | #1184 |
| 0.9512 | Record: PROTEUS v7 — 11L INT6 + LoRA TTT (mean val_bpb=0.9512, 3 seeds) | MatoTeziTanka | #512 |
| 0.9581 | Record: Score-First TTT + N-gram Backoff (3-seed mean val_bpb=0.9581) | Asukabot0 | #761 |
| 0.9581 | Record: Score-First TTT + Multi-Order N-gram Backoff (3-seed mean val_bpb=0.9581) | antaloaalonso | #940 |
| 0.9588 | [Val Only]: MLP 3x + STE int6 QAT + sliding window, val_bpb=0.9588 | andrewgcodes | #120 |
| 0.9605 | Record: 11L Full GPTQ + Multi-Order N-gram Backoff (fixed-alpha 0.9757 / entropy-adaptive 0.9605, 3-seed) | raahilshah | #778 |
| 0.9623 | Record: 0.9623 BPB — 7-Gram Entropy Cache + XSA-all + EBLS | Robby955 | #777 |
| 0.9625 | Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) | newjordan | #753 |
| 0.9631 | Record: 11L XSA + Mixed INT6 + Adaptive N-gram Cache (2->7 backoff) - val_bpb=0.9631, 3-seed | aerosta | #993 |
| 0.9633 | Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633) | ndokutovich | #764 |
| 0.9641 | [10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache | skoustav35 | #1185 |
| 0.9642 | Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) | anthony-maio | #887 |
| 0.9642 | Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) | anthony-maio | #889 |
| 0.9642 | Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff | anthony-maio | #915 |
| 0.9650 | Record: Trinity Ternary GPT — val_bpb 0.9650 (ternary roundtrip) | deborahnelson8788726 | #1246 |
| 0.9674 | Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) | Asukabot0 | #727 |
| 0.9693 | SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT) | sanyalsunny111 | #1055 |
| 0.9789 | Record*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT) | lukacf | #517 |
| 0.9850 | Record: Cosine TTT + Multi-Order N-gram Cache (3-seed mean val_bpb=0.9850) | andrewbaggio1 | #741 |
| 0.9850 | Non-record: Negative Results — Architecture, TTT Variants, Quantization, and N-gram Cache Illegality | andrewbaggio1 | #1186 |
| 0.9901 | MDLM Diffusion — val_var_bpb 0.9901, EOS learning + full dataset shard rotation, 33M params, 1x AWS A10G | aiejvn | #1241 |
| 0.9917 | Record: 11L XSA-all + backoff 7-gram (mean val_bpb=0.9917) | hypery11 | #763 |
| 0.9958 | Record: LeakyReLU(0.9)² + N-gram Cache + Entropy-Reg QAT — val_bpb 0.9958 (3-seed mean) | lolrazh | #885 |
| 1.0030 | Non-record: 10L Gated DeltaNet (PureGDN) — val_bpb 1.003028 (3-seed mean, legal TTT) | Christopher-Lee-McClendon | #1370 |
| 1.0046 | Record: L-BFGS Causal SLOT — val_bpb 1.0046 (3-seed mean) | resouer | #1350 |
| 1.0050 | V20: Cascaded 2-Phase L-BFGS Causal SLOT (1.00497 BPB, 3-seed) | Bortlesboat | #1372 |
| 1.0095 | Record: TTT-AdamW + SLOT L-BFGS25 LogitDelta + GPTQ DAMP=0.005 — val_bpb 1.00955 | renqianluo | #1318 |
| 1.0098 | Record: GatedDeltaNet FLA + Score-First TTT + Brotli — val_bpb 1.00980 (3-seed mean) | aamodbhatt | #1711 |
| 1.0099 | Record: GatedDeltaNet (FLA) + Legal Score-First TTT — val_bpb 1.00995 (3-seed mean) | arsenis-cmd | #1698 |
| 1.0108 | Record: GatedDeltaNet + Legal TTT + Brotli-11 — val_bpb 1.01080 (3-seed mean, VALID artifacts) | yahya010 | #1734 |
| 1.0116 | Non-record: Sequential Momentum TTT (val_bpb=1.0116, 3-seed mean, 4xA10G) | connectwithprakash | #807 |
| 1.0119 | Record: GDN-Hybrid + TMA Megakernel + Brotli-11 — val_bpb 1.01195 (3-seed mean) | andrewbaggio1 | #1672 |
| 1.0167 | Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 (cold-cache, 1.01671 BPB) | joshkmartinez | #1575 |
| 1.0167 | Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean) | joshkmartinez | #1576 |
| 1.0171 | Record: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB) | joshkmartinez | #1564 |
| 1.0171 | Record: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB) | joshkmartinez | #1622 |
| 1.0190 | Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean) | aamodbhatt | #1712 |
| 1.0205 | Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention) - quantized_bpb 1.02046 | joshkmartinez | #1562 |
| 1.0205 | Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention) | joshkmartinez | #1563 |
| 1.0208 | GDN-Hybrid: Fix warmdown/SWA/QAT timing — 1.0208 BPB | OE-GOD | #1681 |
| 1.0217 | SOTA Attempt: Paid prefix (val_bpb=1.0238) | spokane-way | #168 |
| 1.0222 | Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222, 3-seed mean) | stukenov | #745 |
| 1.0226 | New Record: Pure Neural GDN 1.0226 BPB (shalyhinpavel) | shalyhinpavel | #875 |
| 1.0244 | Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish) | lukacf | #702 |
| 1.0274 | Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean) | Hkoyuer | #1632 |
| 1.0277 | Record: PR #1738 + PreQuant TTT LR=1e-3 + Unfrozen — val_bpb 1.02767 (3-seed mean) | kilojoules | #1758 |
| 1.0278 | Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0278, 3-seed mean) | stukenov | #733 |
| 1.0283 | Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.028308 (3-seed cold-cache mean) | Abhishek8108 | #1544 |
| 1.0283 | Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.028308 (3-seed cold-cache mean) | Abhishek8108 | #1545 |
| 1.0321 | Gravity Tokenizer: 1.0321 BPB via ablation leverage vocabulary optimization | dcrow85 | #755 |
| 1.0337 | Record: XSA-all + LeakyReLU² + VR + GA + 7-gram cache (val_bpb=1.0337) | Asukabot0 | #715 |
| 1.0339 | Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean) | genji0306 | #1705 |
| 1.0340 | 11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB) | xexyz | #792 |
| 1.0354 | Record: PR #1735 + CaseOps Tokenizer V15 (val_bpb 1.03540, mean of 3 seeds) | alertcat | #1738 |
| 1.0362 | Record: 1.0362 BPB — SGD Momentum 0.95 TTT + HedgeMixer + Per-Layer LR | dexhunter | #995 |
| 1.0365 | Record: 8L Paid Prefix + Sparse Hard Blocks (1.0365) | nicolasdickenmann | #278 |
| 1.0366 | Record: Chained TTT — Cosine Recovery + Multi-Pass Scoring (3-seed mean val_bpb=1.0366) | andrewbaggio1 | #685 |
| 1.0400 | Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA | pentxayc | #731 |
| 1.0409 | Record: K_KVShare_Wider full-recipe FLA — val_bpb 1.04090 (3-seed mean) | resouer | #1687 |
| 1.0429 | Record: SP8192 + Parallel Pre-Quant TTT — val_bpb 1.0429 (3-seed mean) | AjAnubolu | #1735 |
| 1.0450 | Record: 1.0450 BPB — SGD TTT + HedgeMixer with Per-Layer LR Groups | dexhunter | #967 |
| 1.0461 | Podracing: 1.0461 BPB (3-seed mean) | newjordan | #674 |
| 1.0461 | Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU² | newjordan | #706 |
| 1.0465 | Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465) | hypery11 | #758 |
| 1.0467 | E2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB) | gowtham0992 | #872 |
| 1.0467 | E2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB) | gowtham0992 | #873 |
| 1.0487 | Record: pcloadloveletter v6 — Novel Codebook+Huffman Compression + AdamW TTT (val_bpb=1.0487) | NotADevIAmaMeatPopsicle | #532 |
| 1.0523 | Record: 11L XSA4 + Multi-Pass Streaming Score-First Legal TTT (3-seed mean val_bpb=1.0523) | Sarimsaljook | #573 |
| 1.0539 | Record: 8L Paid Prefix + SmearGate + Int6 (val_bpb=1.0539) | ibarrajo | #262 |
| 1.0539 | Non-record: Paid Prefix Research (val_bpb=1.0539, ruled out-of-scope) | ibarrajo | #275 |
| 1.0541 | Record Submission: 1.0541 BPB - 5-expert Hedge Mixer + CROWN-Q + stride=64 | RoyiRa | #700 |
| 1.0573 | Record: Casefold V4 + AttnOutGate + Multi-Phase Global SGD TTT — val_bpb 1.05733 (3-seed mean) | dexhunter | #1693 |
| 1.0574 | Record: 11L Sidecar48 + Enhanced Attention + Async Data Pipeline + AdamW TTT (20 epochs, cosine LR, 3-seed mean val_bpb=1.0573) | DeepReinforce | #684 |
| 1.0577 | SR-CM-P2Loss: 1.0577 bpb (~15.06MB) | estesryan | #1180 |
| 1.0585 | SAFE_SUBMISSION: run036-safe016 (1.05850 BPB) | joshkmartinez | #1624 |
| 1.0585 | SAFE_SUBMISSION: run036-safe016 (1.05850 BPB) | joshkmartinez | #1633 |
| 1.0587 | Record: SP8192 + Pre-Quant AdamW TTT + Compiled TTT — val_bpb 1.0587 (3-seed mean) | translatingthename | #1539 |
| 1.0587 | Non-record: Pre-Quant AdamW TTT (Compiled) + SP8192 + Depth Recurrence — val_bpb 1.0587 (3-seed mean) | translatingthename | #1550 |
| 1.0597 | Record: Casefold V4 Tokenizer + Multi-Phase Global SGD TTT — val_bpb 1.05970 (3-seed mean) | dexhunter | #1670 |
| 1.0600 | Record: SP8192 + Recur345 + Par7 + EMA + QK5.25 + Pre-Quant TTT 10ep — val_bpb 1.0600 (3-seed mean) | ndokutovich | #1487 |
| 1.0616 | SP8192 + SLOT-4 + TTT + 3-Layer Recurrence + Parallel Residuals (1.0616 BPB) | powerpratik | #1647 |
| 1.0622 | Record: 11L XSA4 + LeakyReLU(0.5)² + Cosine TTT 50ep (val_bpb=1.0622) | sofiabod | #518 |
| 1.0632 | Record: Depth Recurrence + Banked Muon + Pre-Quant TTT (18ep) — val_bpb 1.0632 (3-seed mean) | RulinShao | #1517 |
| 1.0639 | Record: Casefold Tokenizer + Parallel Residuals + Systems Optimization — val_bpb 1.0639 (3-seed mean) | codemath3000 | #1585 |
| 1.0651 | Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505 | romeerp | #1756 |
| 1.0655 | Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 | dexhunter | #1736 |
| 1.0668 | Record: Custom Casefold Tokenizer — 1.0668 BPB | mikeapedia | #1578 |
| 1.0672 | Record: SwiGLU + XSA4 + U-Net + AdamW TTT (3-seed mean val_bpb=1.0672) | JoeProAI | #462 |
| 1.0678 | Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) | romeerp | #1729 |
| 1.0679 | Record: SP8192 + 3-Layer Depth Recurrence + Parallel Residuals + EMA + QK5 + Pre-Quant AdamW TTT — val_bpb 1.0679 (3-seed mean) | ndokutovich | #1485 |
| 1.0698 | Record: 11L Sidecar48 + Enhanced TTT (cosine LR, 20 epochs) — 1.0698 BPB (3-seed mean) | teddyoweh | #581 |
| 1.0705 | Record: AdamW TTT 30ep Cosine + Per-Layer LR (val_bpb: 1.0705) | sunnypatneedi | #771 |
| 1.0713 | Record(?): WARP (Word-Aware Representation Priors) — val_bpb 1.0713 | 1xH100 10min | 13.65 MB | ahmetdenizyilmaz | #1252 |
| 1.0714 | RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 | MarioPaerle | #1667 |
| 1.0717 | Record: 10L + 7-gram eval cache (mean val_bpb=1.0717) | hypery11 | #724 |
| 1.0719 | Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean) | dexhunter | #1626 |
| 1.0722 | Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean) | yahya010 | #1727 |
| 1.0722 | Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix | amrayach | #1740 |
| 1.0722 | Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix | amrayach | #1741 |
| 1.0722 | Add SP8192 Multi-Phase Global SGD + Phased TTT (1.07219 bpb) | jorge-asenjo | #1700 |
| 1.0722 | Record: 1.0722 BPB — Improved TTT + HedgeMixer with Per-Layer LR Groups | dexhunter | #953 |
| 1.0723 | [Submission] SP8192 FullStack PartialRoPE LeakyReLU - 2026-04-19 | sakthivarshans | #1737 |
| 1.0729 | Add phased global SGD TTT prefix submission | romeerp | #1610 |
| 1.0736 | Record: SP1024 + Pre-quant TTT + Parallel Residuals — 1.0736 BPB (beats 1.1147 by 3.66%) | joshkmartinez | #1489 |
| 1.0740 | Add SP10240 + FreqGPTQ + lowercase tokenization: 1.07399 BPB | nothingLiva | #1707 |
| 1.0741 | Record: VarLen Attention + Triton Fused MLP + Doc-TTT + Warmdown 0.75 + Chunk 48 — val_bpb 1.07406 (3-seed mean) | dexhunter | #1560 |
| 1.0742 | Recursive Transformer - Non-Record Submission — 1.07424983 val_bpb (4h depth-recurrent hybrid transformer run) | newjordan | #1535 |
| 1.0744 | Record: ImprovedParallelResiduals, 1.0744 BPB / 2.7752 nats, -0.0034 BPB / -0.0088 nats vs PR #1523 | msisovic | #1529 |
| 1.0744 | [Non-record] Experimentation Summary: Autopsy of 100+ Experiments — What Worked, What Didn’t, Mind Map for LLM Agents, etc. | SPThole | #1602 |
| 1.0745 | Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745) | RoyiRa | #687 |
| 1.0745 | Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745) | RoyiRa | #688 |
| 1.0746 | Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + CaseOps Tokenizer — val_bpb 1.07462 (3-seed mean) | OE-GOD | #1755 |
| 1.0749 | Record: Per-Layer Adaptive GPTQ Clip + int7 Embeddings + MLR 0.026 — val_bpb 1.07493 (3-seed mean) | dexhunter | #1586 |
| 1.0752 | Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean) | codemath3000 | #1584 |
| 1.0756 | Non-record: xIELU Piecewise Quadratic Activation + Per-Layer QK Gain Convergence | mikeapedia | #1648 |
| 1.0759 | [Record] Stage 3 + SpinQuant V1 + MP-SGD-TTT — val_bpb 1.0759 | X-Abhishek-X | #1695 |
| 1.0764 | Record: TMA Megakernel + Improved Parallel Residuals + Tap-In min_match=1 — val_bpb 1.07636 (3-seed mean) | andrewbaggio1 | #1555 |
| 1.0764 | Record: Varlen attention + fused MLP + doc-independent TTT (1.07643) | samacqua | #1530 |
| 1.0766 | Record: SP4096 + Depth Recurrence + Parallel Residuals + Causal SLOT-16 — val_bpb 1.0766 (3-seed mean) | aryanbhosale | #1333 |
| 1.0771 | Non-record: Neural Base Model, No TTT — Parcae + Gates + Layered Windows (val_bpb 1.07706) | mikeapedia | #1728 |
| 1.0773 | Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + TTT 5ep + N-gram Tilt + Hessian SDClip — val_bpb 1.07730 | ndokutovich | #1557 |
| 1.0775 | Record: SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Banking + Muon 0.97 — val_bpb 1.07747 (3-seed mean) | dexhunter | #1536 |
| 1.0777 | Record: SP8192 + VarLen Attention + LoRA TTT + Fused MLP — val_bpb 1.0777 (3-seed mean) | aryanbhosale | #1540 |
| 1.0778 | Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) | EthanYangTW | #1523 |
| 1.0778 | Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) | bigbag | #1541 |
| 1.0780 | Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07800 (3-seed mean) | dexhunter | #1437 |
| 1.0781 | Record: 30ep Cosine TTT on LeakyReLU² stack (3-seed mean val_bpb=1.0781) | andrewbaggio1 | #672 |
| 1.0783 | Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0783 (3-seed mean) | EthanYangTW | #1561 |
| 1.0785 | Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 (3-seed mean) | Victory963 | #1731 |
| 1.0785 | Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 | Victory963 | #1732 |
| 1.0787 | Record: SP8192 + Pre-Quant TTT (QK 5.25, 8ep, freeze-1) — val_bpb 1.0787 (3-seed mean) | aamodbhatt | #1482 |
| 1.0788 | Record-track: Trajectory-State Readout + Muon 0.98 + Legal TTT (1.0788) | aazizyan | #1676 |
| 1.0788 | Record: SP8192 + BigramHash d=32 + Path A v3 passthrough quantization — val_bpb 1.07882 (3-seed mean) | himanshudongre | #1716 |
| 1.0788 | Non-record: Eval-time lever ablations on SP8192 absolute-RoPE stack (companion to PR #1716) | himanshudongre | #1718 |
| 1.0788 | Record: Wider Loop + Per-Pass Embeddings + Tap-In V6 + Legal TTT (1.078825 3-seed mean) | abaybektursun | #1518 |
| 1.0790 | Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean) | aryanbhosale | #1533 |
| 1.0791 | Record: SP8192 + Pre-Quant TTT + QK-Gain 5.0 + Depth Recurrence + MuonEq-R — val_bpb 1.0791 (3-seed mean) | aryanbhosale | #1423 |
| 1.0795 | Record: SP8192 + Pre-Quant TTT — val_bpb 1.07948 (3-seed mean) | erichroepke | #1416 |
| 1.0797 | Record: SP8192 + Depth Recurrence x2 + GPTQ + Score-First TTT + fused-softcap-ce -- val_bpb 1.07974 (3-seed mean) | anthony-maio | #1572 |
| 1.0798 | Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean) | dexhunter | #1514 |
| 1.0799 | Non-record: SP8192 + LoRA on tied embedding (1.07994, 1 seed) | yijieyuan | #1759 |
| 1.0800 | Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean) | aamodbhatt | #1408 |
| 1.0801 | Record: SP8192 + Systems Optimization — val_bpb 1.0801 (3-seed mean) | codemath3000 | #1583 |
| 1.0801 | Record: Triple Loop + Fused Kernels + Parallel Residuals + N-gram Tilt; val_bpb 1.08014 (5-seed mean) | abaybektursun | #1420 |
| 1.0802 | Record: SP8192 + Muon 0.97 + 3-Layer Recurrence + Parallel Residuals + TTT — val_bpb 1.0802 (3-seed mean) | aryanbhosale | #1521 |
| 1.0803 | Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + Asynchronous Data Loader - val_bpb 1.0803 | nogakeren | #1532 |
| 1.0805 | Non-Record: Polar Express Muon negative result (1.0805 BPB, +0.0004 vs standard NS5) | dexhunter | #1516 |
| 1.0806 | Scylla (novel tokenizer) + Legal Score-First TTT (val_bpb: 1.08056553) | simon-marcus | #1143 |
| 1.0807 | Record: Discriminative TTT — val_bpb 1.0807 (3-seed mean) | resouer | #1351 |
| 1.0809 | Add SP8192 qkramp05 + par-residual L6 + legal TTT systems rerun (1.080885 seed 42) | Buld1n | #1688 |
| 1.0809 | Add W104 SP8192 LegalTTT record candidate | teslaeco | #1750 |
| 1.0809 | Record: QK-Gain 5.5 — val_bpb 1.0809 (3-seed mean) | G3sparky | #1715 |
| 1.0810 | Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) | bigbag | #1492 |
| ★ | 1.0810 | Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) | bigbag | #1493 |
| 1.0810 | Submission: SP8192 + DepthRecur + MuonEq-R + SGD-TTT + SDClip GPTQ + Brotli-11 | AVINASH0052 | #1658 |
| 1.0812 | Add non-record SP8192 pass-gated recurrence submission | Buld1n | #1697 |
| 1.0813 | Non-record: SP8192 D 5-seed base and R-series evidence package | amrayach | #1598 |
| 1.0813 | Add near-SOTA SP8192 LegalTTT 3-seed reproduction | teslaeco | #1725 |
| 1.0815 | Non-record: SP8192 + QK5.4 + Legal Score-First TTT(4) — val_bpb 1.08149 (seed 1337) | aamodbhatt | #1706 |
| 1.0818 | Notable: SP8192 + 3-Layer Recurrence + Parallel Residuals - 5-Seed Quantization Reference and SDClip Ablations | kiyoaki | #1720 |
| 1.0819 | Record: PROTEUS v1.6 — Scylla + Parallel Residuals + Depth Recurrence + Legal TTT — val_bpb 1.0819 (3-seed mean) | MatoTeziTanka | #1289 |
| 1.0820 | Submission: SP8192 + Partial RoPE (16/64) + GPTQ SDClip + SGD TTT — val_bpb 1.0820 (3-seed mean) | swapp1990 | #1747 |
| ★ | 1.0822 | Record: SP8192 + Parallel Residuals + Score-First TTT — val_bpb 1.0822 (3-seed mean) | aryanbhosale | #1477 |
| 1.0822 | SP8192 + Adaptive
Hessian-Sensitivity GPTQ Clipping — 1.0822 bpb | chris-colinsky | #1689 |
| 1.0824 | SP8192 + Gated Attention + NorMuon + Norm-PCT-Dropout + Legal TTT — val_bpb 1.0824 | taka6745 | #1520 |
| 1.0827 | Record: SP8192 + TTT + Eval-Time Hash Embedding — val_bpb 1.08269 (3-seed mean) | resouer | #1460 |
| ★ | 1.0828 | Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 (3-seed mean) | dexhunter | #1413 |
| 1.0829 | Notable Non-Record: Switched Deep Supervision (first DS submission) | channyzf6 | #1629 |
| 1.0832 | Adaptive Test-Time Training (TTT) with continuous LR-scaling. | kunwar-vikrant | #1639 |
| 1.0832 | Add Adaptive TTT experiment results | kunwar-vikrant | #1638 |
| 1.0832 | Add non-record SP8192 Parcae trajectory readout submission | Buld1n | #1703 |
| ★ | 1.0835 | Non-record: Parallel Residuals + Hessian-Aware SDClip (3-seed mean 1.08354 BPB) | Robby955 | #1412 |
| 1.0842 | [Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB | aryan-cs | #1476 |
| 1.0845 | Non-record: QK4 Legal TTT Reproduction (1.08449 BPB) | N10ELabs | #1730 |
| 1.0846 | SP4096 + Depth Recurrence + Parallel Residuals + Legal N-Gram | someone114514 | #1534 |
| 1.0846 | Record: Causal SLOT + Pre-quant TTT — val_bpb 1.0846 (3-seed mean) | resouer | #1306 |
| 1.0848 | Record: TMA Megakernel + Triple Loop + Parallel Residuals — val_bpb 1.08480 | andrewbaggio1 | #1450 |
| 1.0850 | [non record] Investigating the Tied-Embedding Bottleneck: Why Boundary Blocks Underperform and What It Means for 16MB Models | SPThole | #1546 |
| 1.0853 | [Non-Record] Extended Compute Scaling Analysis: 1.0853 BPB at 50K steps (11.5 hours) on 4×A100MIG | OnlyJundong | #1005 |
| 1.0854 | Lucky V — 1.08540457 val_bpb (seed 444) | newjordan | #1322 |
| 1.0855 | Add: 11L Complement Training + TTT + No-JEPA submission (val_bpb 1.0855) | BoxiYu | #1257 |
| 1.0856 | Record: Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean) | anthony-maio | #1405 |
| ★ | 1.0856 | Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) | clarkkev | #1394 |
| 1.0857 | Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5 | ymrohit | #988 |
| 1.0857 | SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.5 + SGD-TTT [LoRA-TTT Future Work] | Anakintano | #1714 |
| 1.0858 | Non-record: Extended Compute Scaling Analysis (50K steps, 1.0858 BPB, 3 seeds (each run ~12 hours on 4xA100MIG)) | OnlyJundong | #1424 |
| 1.0858 | Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.0858 | pablinga19 | #1660 |
| 1.0862 | Record: SP8192 + 3-layer recurrence + hard onset — val_bpb 1.0862 (3-seed mean) | pablinga19 | #1662 |
| 1.0862 | Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean) | pablinga19 | #1663 |
| 1.0865 | Record: Loqui Auris — 10L + LoRA TTT (mean val_bpb=1.0865, 2 seeds) | LoquiAuris | #548 |
| 1.0866 | [Record] SP8192 + SDClip + 3-Layer Depth Recurrence + EMA 0.9965 — val_bpb 1.0866 | X-Abhishek-X | #1471 |
| 1.0872 | Non-Record: SP8192 + LeanICQ Compose at Int3 — val_bpb 1.08720 / 15.88 MB | dexhunter | #1515 |
| 1.0876 | Record: Scylla + Parallel Residuals + Depth Recurrence + Legal TTT — val_bpb 1.0876 (3-seed mean) | MatoTeziTanka | #1274 |
| 1.0881 | Add non-record submission: SP8192 baseline + LZMA code-wrap | upascal | #1754 |
| 1.0887 | Record: 11L Depth Recurrence + Discriminative Pre-Quant TTT (8xH100) — val_bpb 1.0887 (3-seed mean) | aamodbhatt | #1406 |
| 1.0889 | [submission] SP8192 + QK5 + Freeze10 Loss-Gated Legal TTT (1.08885521) | MuhammedErinArchitecture | #1744 |
| 1.0889 | Non-record: 4-Hour Progressive Depth — val_bpb 1.0889 | iverbovoy | #895 |
| 1.0889 | [Record] 3-Layer Depth Recurrence + EMA 0.9965 + WD 0.095 — val_bpb 1.0889 | X-Abhishek-X | #1445 |
| 1.0896 | Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + Legal TTT — val_bpb 1.0896 (3-seed mean) | aryanbhosale | #1326 |
| 1.0896 | GatedAttn + ValueResid + XSA6 + HedgeMixer + Legal TTT — val_bpb: 1.08965 (3-seed mean) | sahiee-dev | #824 |
| ★ | 1.0897 | Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + QK-Gain 5.0 — val_bpb 1.0897 (3-seed mean) | aryanbhosale | #1334 |
| 1.0898 | Record: Pre-Quant TTT + ETLB: Eval-Time Logit Bias for Neural Language Model Compression 1.0898 BPB on PR #1285 base | AnubhavBharadwaaj | #1399 |
| 1.0900 | Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean) | dexhunter | #1331 |
| 1.0903 | Record: Scylla + n-gram + legal TTT — val_bpb 1.0903 (3-seed mean) | Campbellb | #1242 |
| 1.0909 | Record: 9L XSA-all + LeakyReLU² + 5-gram eval cache — val_bpb 1.0909 (3-seed mean) | resouer | #740 |
| 1.0909 | Non-record: Int5 GPTQ + Wider MLP | sergeevii123 | #1646 |
| ★ | 1.0912 | Record: MuonEq-R + Depth Recurrence + WD=0.090 + All-Int6 GPTQ — val_bpb 1.0912 (3-seed mean) | dexhunter | #1285 |
| 1.0913 | Record: SP4096 + 3-Layer Recurrence + GPTQ Embeddings + SDClip + ETLB — val_bpb 1.0913 (3-seed mean) | bigbag | #1415 |
| 1.0916 | Add 11L Shared Sparse Sidecar + EMA + AdamW TTT (1.0916 mean) | ymrohit | #555 |
| 1.0920 | Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM | deanbrr | #659 |
| 1.0920 | Non-record: 11L GEPA + 30k Steps + Pure Int6 + Legal TTT (val_bpb=1.0920) | Christopher-Lee-McClendon | #668 |
| 1.0923 | Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) | Omrigotlieb | #1344 |
| 1.0924 | Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean) | dexhunter | #1279 |
| 1.0924 | Record: SP4096 + Linear LR + Depth Recurrence -- val_bpb=1.0924 (3-seed mean) | dttdrv | #1395 |
| 1.0925 | Record: Vocab4096 + MLP4.0x + SLOT - val_bpb 1.0925 (3-seed mean) | dentity007 | #1291 |
| 1.0925 | [Record] 11L Depth Recurrence + EMA Tuning (0.9965) — val_bpb 1.0925 | X-Abhishek-X | #1421 |
| 1.0926 | Record: SP4096 + Depth Recurrence + MuonEq-R + Full GPTQ — val_bpb 1.0926 (3-seed mean) | aryanbhosale | #1296 |
| 1.0929 | feat: Non-record 11L PR940 Stack (no n-gram in use) + 20k Steps + Legal TTT (1.0929 BPB) | Christopher-Lee-McClendon | #1232 |
| 1.0929 | Record: MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ — val_bpb 1.0929 (3-seed mean) | dexhunter | #1260 |
| 1.0941 | Submission: 1.0941 BPB by David Weyh | Upsalla | #622 |
| 1.0944 | Non-record: 11L GEPA + 25k Steps + Pure Int6 + Legal TTT (val_bpb=1.0944) - unlimited compute category | Christopher-Lee-McClendon | #644 |
| 1.0945 | N-gram Cache + Entropy-Adaptive Alpha: 1.0945 BPB | danielxmed | #1026 |
| 1.0955 | Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean) | bigbag | #1338 |
| 1.0955 | Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean) | bigbag | #1339 |
| 1.0959 | Record: SP4096 + Polar Express NS + MuonEq-R + WD=0.090 — 1.0959 BPB (3-seed mean) | Omrigotlieb | #1332 |
| 1.0960 | Non-record: 11L gated Krylov + AR GPTQ int6 + lzma, 1.09596 BPB | LauraGomezjurado | #1446 |
| 1.0960 | Non-record: Extended Compute Scaling Analysis (20K steps, 1.0960 BPB, 3 seeds (each run ~6 hours on 4xA100MIG)) | OnlyJundong | #1407 |
| 1.0962 | Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0962 (3-seed mean) | bigbag | #1176 |
| 1.0963 | Lucky IV — 1.09626897 val_bpb (seed 444) | newjordan | #1286 |
| 1.0970 | Record: Cosine TTT scheduling with per-layer lr — mean val_bpb=1.0970 (3 seeds) | mrdavtan | #481 |
| 1.0970 | Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970) | gowtham0992 | #738 |
| 1.0970 | SP8192_LayerRecur_ParResid_QK525_paramOpt | yufang67 | #1570 |
| 1.0976 | Add non-record SP8192 Parcae prefix TTT readout-only submission | Buld1n | #1704 |
| ★ | 1.0978 | Record: 4096-Vocab + 4.0-MLP-mult + 0.085-WD + Simplifications — val_bpb 1.09785 (3-seed mean) | clarkkev | #1218 |
| 1.0980 | Record: 11L Depth Recurrence + BigramHash + EMA 0.9965 — val_bpb 1.0980 (3-seed mean) | AbhayAnandUCSD | #1435 |
| 1.0980 | WIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base | zoharb157 | #1664 |
| 1.0983 | Non-record: 11L GEPA + 20k Steps + Pure Int6 + Legal TTT (val_bpb=1.0983): unlimited compute: 4×A100-40GB, ~2.8 hours | Christopher-Lee-McClendon | #628 |
| 1.0988 | PR #414 + 30-Epoch Cosine TTT (1.0988 BPB) | xexyz | #691 |
| 1.0996 | GDN-Hybrid + Legal Score-First TTT + Full-Hessian GPTQ Int6 | gracebml | #1749 |
| 1.1000 | Non-Record: TTT and GPTQ Are Fundamentally Incompatible — Quantized Weight Structure Defeats Test-Time Adaptation | himanshudongre | #1341 |
| 1.1015 | Record: SLOT + Split-LR + Full GPTQ + XSA-all — val_bpb 1.1015 (3-seed mean) | dexhunter | #1172 |
| 1.1016 | Add non-record SP8192 trajectory readout submission | Buld1n | #1701 |
| 1.1020 | Add record: SP4096 + Depth Recurrence + Parallel Residuals + QK-Gain + Brotli (1.1020 BPB) | Its-Just-Crump | #1392 |
| 1.1025 | Record: Pre-quant AdamW TTT + QK-Gain 4.0 — val_bpb 1.1025 (3-seed mean) | stukenov | #1364 |
| 1.1026 | [Submission] EngramLite + Mousse + Progressive Depth Recurrence + TTT — val_bpb 1.1026 | 15.95MB | 8×H100 | Mertyandimata | #1440 |
| 1.1027 | Record: 11L EMA + AdamW TTT 10ep (mean val_bpb=1.1027) | sjp611 | #442 |
| 1.1027 | Record: MuonEq-R + Context-Only SLOT + QK_GAIN=5.0 — val_bpb 1.1027 (3-seed mean) | bigbag | #1217 |
| 1.1035 | Record: Hadamard-Rotated GPTQ + dTTT + Recur2 (1.1035 BPB) | tmancino | #1400 |
| 1.1035 | Slot Machine — 1.10350531 val_bpb (seed 444) | newjordan | #1282 |
| 1.1036 | val_bpb 1.1036 - 12L sp9000 + depth recurrence + hash-TTT | Idan3011 | #1565 |
| 1.1043 | Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean) | Omrigotlieb | #1297 |
| 1.1043 | Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean) | Omrigotlieb | #1298 |
| 1.1047 | Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Auto-QMax GPTQ + TTT — val_bpb 1.1047 | Mertyandimata | #1397 |
| 1.1047 | Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047 | Mertyandimata | #1398 |
| 1.1048 | Record: Vocab4096 + MLP4.0x + WD0.085 - val_bpb 1.1048 (3-seed mean) | dentity007 | #1287 |
| 1.1056 | Non-record: XSA-11 + Parallel Residual (L7+) + Depth Recurrence — val_bpb 1.1056 (1-seed, 1×H100) | PhamPhuHoa-23 | #1467 |
| 1.1057 | Midnight 12L — 1.10567949 val_bpb (seed 444) | newjordan | #1458 |
| ★ | 1.1063 | Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA | msisovic | #1204 |
| 1.1063 | Non-record: TWEO early-cosine outlier regularization on SP1024 baseline | PapaFranku4647 | #1635 |
| 1.1064 | Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean) | andrewbaggio1 | #1209 |
| 1.1064 | Non-record: Does SLOT violate causal dependence? (empirical test + question) | andrewbaggio1 | #1240 |
| 1.1067 | Record: Combined 3-Layer Recurrence + Parallel Residuals + Polar Express + Brotli — val_bpb 1.1067 (3-seed mean) | erichroepke | #1396 |
| 1.1070 | Record: XSA + LoRA TTT (val_bpb=1.1070) | Elarwei001 | #1254 |
| 1.1077 | Add non-record submission: 12L 24min Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits | shram86 | #1495 |
| 1.1078 | Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack | agalimova | #720 |
| 1.1078 | Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean) | vlivashkin | #1302 |
| 1.1079 | Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079) | Christopher-Lee-McClendon | #612 |
| 1.1084 | Record: PR #1105 + window attn + mixed seq_len — 1.1084 bpb (3-seed mean) 1.1084 bpb | Gusanidas | #1219 |
| 1.1085 | 1.1085 BPB: JEPA + AdamW TTT + Full GPTQ + FA3 + LZMA | NewyorkDev | #1006 |
| 1.1086 | Record Submission: 1.1086 BPB - Turbo-Muon + EngramLite + ParamBanking (11L 512d) | mikeapedia | #1089 |
| 1.1088 | Non-Record: Unified Attention + FA3 + 1hr training (val_bpb=1.1088) | VirajDeshwal | #1270 |
| 1.1090 | [Notable Non-Record Submission] 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps/3h) | CiprianFlorin-Ifrim | #923 |
| 1.1091 | review: Rerun of PR #1089 | AnirudhRahul | #1126 |
| 1.1092 | add varlen+fused mlp+ttt record | samacqua | #1354 |
| 1.1092 | Add non-record SP8192 tempered BPB-weighted loss submission | Buld1n | #1702 |
| 1.1093 | Record: 15L Depth Recurrence + LeakyReLU² + Cosine TTT (3-seed mean val_bpb=1.1093) | aruniyer | #857 |
| 1.1098 | review: Rerun of PR #1120 (Rascal) on 8xH100 SXM | dexhunter | #1177 |
| 1.1099 | val_bpb 1.1099 (3-seed mean) Rascal | newjordan | #1120 |
| 1.1100 | Record: Loqui Auris — 10L + SWA + Standard TTT (val_bpb=1.1100) | LoquiAuris | #595 |
| 1.1100 | Non-record: Comprehensive Negative Results — What Doesn't Work on Strong Models | andrewbaggio1 | #1272 |
| 1.1100 | Record: MuonEq-R + Context-Only SLOT + XSA-all + QK-Gain 5.0 | BiggerDABOSS | #1276 |
| 1.1100 | Submission/epsilon flashonly 2026 04 06 | teerthsharma | #1401 |
| 1.1101 | Add 07c1 strict RunPod base submission | amrayach | #1307 |
| 1.1101 | Record: 11L TrigramHash + ValueResidual + GradQuant + Cosine TTT (mean val_bpb=1.0887, best 1.0879) | ndokutovich | #486 |
| 1.1104 | Record: Depth Recurrence + MuonEq-R + AR Self-Gen GPTQ — val_bpb 1.1104 (3-seed mean) | aryanbhosale | #1290 |
| 1.1104 | [Non-record] E2E TTT at 27M scale — negative result (val_bpb 1.1104, SP1024) | ChideraIbe123 | #1625 |
| 1.1104 | Non-record: 11L s2048 4h on 1xA100 — 1.1104 BPB | xiehuanyi | #1528 |
| 1.1105 | Record: 11L Int5 + 6-Expert HedgeMixer + LeakyReLU(0.9)^2 + TTT (val_bpb=1.1105) | dttdrv | #849 |
| 1.1105 | Record: Split-LR + BigramHash(2816x160) + Full GPTQ + Brotli — val_bpb 1.1105 (3-seed mean) | dexhunter | #1179 |
| 1.1108 | Record: Window Attention + Mixed Seq_Len Training, bpb 1.1108, eval at 6144 (5-seed mean) | Gusanidas | #1212 |
| 1.1109 | Record: 1.1109 BPB Loader FullGPTQ XSA11 + online ngram augment | AnirudhRahul | #1145 |
| 1.1111 | val-only 10min record (val_bpb:1.1111) | daniellawson9999 | #44 |
| 1.1116 | Record: Fused Triton MLP + Full GPTQ + Coprime Loader + XSA-all + BH2816 (val_bpb 1.1116) | barneywohl | #1135 |
| 1.1117 | Record: Bank QAT + seq4096 + SWA w=256 + QK-Gain 2.5 + PKO — val_bpb 1.1117 (3-seed mean) | Itssshikhar | #1512 |
| 1.1123 | Record: 1.1123 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean) | dexhunter | #1060 |
| 1.1124 | Record: Aggressive SGD TTT (3-seed mean val_bpb=1.1124) | fielding | #757 |
| 1.1126 | Turbo-Muon + EngramLite + ParamBanking + GPTQ Reserve Opt — val_bpb 1.1126 (3-seed mean) | Bortlesboat | #1169 |
| 1.1129 | Add non-record AR GPTQ XSA ROTQ Hadamard submission | vermissa0ss | #1224 |
| 1.1130 | Sota 11 l submission | malc3om | #1077 |
| 1.1131 | Add in-progress non-record submission for Legal TTT + Muon+ + QK Gain 4.0 | scottcui-georgian | #1645 |
| 1.1133 | Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean) | Bortlesboat | #1099 |
| 1.1135 | Record: SP4096 + Compressibility Regularization — val_bpb 1.11349 (6-seed mean) | jpfeiffe | #1508 |
| 1.1135 | 14L QEP GPTQ + Per-Window SGD TTT (1.1135 BPB, 3-seed) | himanalot | #1655 |
| 1.1136 | Non-record: 11L XSA-All + EMA + Legal GPTQ on 8xH100 (1.11355 BPB) | Rtx09x | #1694 |
| 1.1140 | Record: 1.1140 BPB — ResidLambdas + Split-LR + Train-Budget GPTQ + Coprime Loader (12-seed mean) | Gusanidas | #1130 |
| 1.1142 | Record: Val-Calibrated GPTQ + XSA-all + BigramHash 3072×112 | abaybektursun | #728 |
| 1.1142 | Non-record: Negative results — quantization algorithms & TTT on val-GPTQ stack | abaybektursun | #756 |
| 1.1143 | Non-record: Exact Sequence Matching on PR #1019 (1.1143 BPB) | cadenmcmann | #1309 |
| 1.1144 | Ultimate: GatedAttn + ValueResidual + Full QAT + lzma-9 + BigramHash(2048) | FlashyFlash3011 | #952 |
| 1.1145 | Record: 33.6M Int5 GPTQ + Score-First TTT (val_bpb=1.1145, 3-seed) | ibarrajo | #991 |
| 1.1145 | 1.1145 BPB: Parallel Muon + INT5 GPTQ + Legal TTT | EthanYangTW | #1171 |
| 1.1146 | Record: EngramLite + Gated Skips + Full GPTQ + FA3 — val_bpb 1.1146 (1-seed, 2 pending) | icryo | #1122 |
| 1.1146 | BPB-weighted training loss: align training objective with eval metric | elliottdehn | #1519 |
| 1.1147 | [Non Record] Learn to Learn: Meta-Learning-TTT Redesign — Cross-Chunk FOMAML + Delta-Loss + MetaSGD | SPThole | #1502 |
| 1.1147 | Memmap multi-shard data pipeline + GPU prefetch + LeakyReLU² + Legal TTT + Parallel Muon | DeepReinforce | #726 |
| ★ | 1.1147 | Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) | abaybektursun | #1019 |
| 1.1147 | Non-record: Negative results — eval-time interventions, mixed-precision GPTQ, loss truncation | abaybektursun | #1103 |
| 1.1147 | WIP: Depth Recurrence via Weight-Shared Transformer Blocks | GitGeeks | #1278 |
| 1.1151 | Legal TTT (SGD, 3-epoch) + SLOT (lr=0.003, steps=5) on PR #549 base -- val_bpb: 1.11512 (3-seed mean, beats merged SOTA 1.1194) | sahiee-dev | #1150 |
| 1.1154 | Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed) | saml212 | #609 |
| 1.1154 | Record: SLOT + LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1154 (3-seed mean) val_bpb = 1.1154 (3-seed mean, std 0.0002) | ~15.9 MB | 8×H100 SXM | AnubhavBharadwaaj | #1128 |
| 1.1156 | Record: AR Self-Gen GPTQ + XSA-11 + BigramHash3072x112 (mean 1.1156) | aamodbhatt | #1280 |
| 1.1156 | Non-record: 11L FullGPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11564 (1-seed) | AVINASH0052 | #1473 |
| 1.1156 | Submission/sp8192 depthrecur adamwttt | AVINASH0052 | #1619 |
| 1.1157 | Add 11L XSA11 + BigramHash3072 + AdamW Legal TTT submission | someone114514 | #841 |
| 1.1158 | Full GPTQ + XSA-all + SWA/EMA (val_bpb=1.1158, 3-seed mean=1.1163) | Robby955 | #639 |
| 1.1158 | Record: 11L LatentMask TTT + GPTQ + Product-Key Bigram + Brotli — val_bpb 1.1158 (3-seed mean) | izlley | #1410 |
| 1.1159 | [Non Record] Learn to Learn: Position-Conditional Bigram Hashing + Meta-Learning + TTT Ablation | SPThole | #1501 |
| 1.1160 | Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160) | hypery11 | #525 |
| 1.1160 | Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160) | hypery11 | #557 |
| 1.1161 | Record: EGGROLL v2 — val_bpb 1.1161 (3-seed mean, std 0.0001) | haikosys | #1156 |
| 1.1162 | Record: int5 GPTQ + Soft-Round QAT (3-seed mean 1.1162) | EthanYangTW | #606 |
| 1.1163 | Record: Full GPTQ + LeakyReLU² + Parallel Muon + BigramHash 3072 (val_bpb 1.1163, 3-seed mean) | abaybektursun | #593 |
| 1.1163 | Stable Growing Recurrence: Progressive Depth + Error Feedback (non-record) | nestamidavaine | #1230 |
| 1.1163 | Non-record: Stable Growing Recurrence, Progressive Depth + Error Feedback | nestamidavaine | #1231 |
| 1.1164 | Record: Train Larger, Quantize Harder - 33.6M params + int5 GPTQ / (val_bpb: 1.1164) | cmcdnd | #576 |
| 1.1164 | Record: 11L XSA-all + LeakyReLU(0.5)² + VR + GA (val_bpb=1.1164, pending 3-seed) | Asukabot0 | #638 |
| 1.1169 | Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x | danialht | #615 |
| 1.1170 | Record: Fused LeakyReLU² + Online GPTQ + Parallel Muon — val_bpb 1.117 (1-seed) | vimeto | #1072 |
| 1.1171 | Non-record: PR703 + shard-order curriculum + GPTQ cache-backout (1.1171) | petergpt | #783 |
| 1.1171 | Record: 11L XSA-all + Full GPTQ + Parallel Muon + Selective Pruning (val_bpb: 1.1171) | raahilshah | #634 |
| 1.1171 | Non-record: Negative results — hardware alignment & quantization on 8xH100 | abaybektursun | #670 |
| 1.1172 | Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x | danialht | #790 |
| 1.1172 | Non-record: Depth Recurrence + GPTQ + SGD TTT (1.1172, 1xH100) | swapp1990 | #1422 |
| 1.1174 | Record: CROWN-Q + GPTQ + Legal TTT — val_bpb 1.1174 (3-seed mean) | EthanYangTW | #1129 |
| 1.1175 | Non-record: Cosine TTT 30ep on SwiGLU + U-Net (1xH100, val_bpb=1.1175) | andrewbaggio1 | #509 |
| 1.1175 | Record: 11L VRL + LeakyReLU² + Full GPTQ (3-seed mean val_bpb=1.1175) | gowtham0992 | #569 |
| 1.1175 | Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175) | andrewbaggio1 | #661 |
| 1.1176 | Record: PR549 + MiLe decay + 8-bit Muon + 1.04x LR + Cache+Backout — val_bpb 1.1176 | Gusanidas | #703 |
| 1.1177 | Non-record: Exact Sequence Matching + TTT on PR #549 (1.1177 BPB) | cadenmcmann | #1310 |
| 1.1178 | Record: Late Soft-Round QAT + Score-First Backward-Looking TTT — val_bpb 1.1178 | RoyiRa | #589 |
| 1.1179 | int5 GPTQ + 33.6M model: 1.1179 BPB (3-seed mean) | EthanYangTW | #544 |
| 1.1179 | Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179) | EthanYangTW | #545 |
| 1.1179 | Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179) | EthanYangTW | #585 |
| 1.1179 | Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean) | aamodbhatt | #999 |
| 1.1179 | Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean) | TimPietruskyRunPod | #1037 |
| 1.1179 | Record: 11L Muon Legal TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean) | aamodbhatt | #1148 |
| 1.1179 | Non-record: SLOT eval-time delta optimization + QK-Gain (val_bpb=1.1179) | ibarrajo | #1236 |
| 1.1180 | [10min/16mb] David Ghazaryan — MoE + BigramHash4096 (mean BPB: 1.11799) | davie2009kh | #1451 |
| 1.1180 | records: David Ghazaryan — MoE + BigramHash4096 (val_bpb 1.11799) | davie2009kh | #1538 |
| 1.1180 | Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180, 3 seeds) | hypery11 | #713 |
| 1.1180 | Record: Full GPTQ + LeakyReLU² + Parallel Muon (3-seed mean 1.1180) | kshitizz36 | #626 |
| 1.1181 | Record: SwiGLU+VE128+NoTTT val_bpb=1.1181 (3-seed mean) | JoeProAI | #505 |
| 1.1182 | Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182 | msisovic | #686 |
| 1.1182 | Record: Depth Recurrence + SGD TTT : 1.1182 BPB | Naazimsnh02 | #752 |
| 1.1182 | Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182) | ibarrajo | #1004 |
| 1.1184 | Add LeakyReLU² + 4ep Legal TTT submission | yufengli-oai | #1039 |
| 1.1184 | Architectural Record: 1.11837 BPB via KGIIR Trajectory Mixing | Adam-Jacuch | #965 |
| 1.1185 | Non-record: Empirical Bayes Adaptive TTT (val_bpb=1.1185) | Robby955 | #484 |
| 1.1185 | LeakyReLU(0.75)² + Legal TTT + Parallel Muon — 1.1185 BPB (3-seed mean) | michaelwinczuk | #977 |
| 1.1185 | Record: MTP-2 Funnel + LeakyReLU(0.75)² + Legal TTT + Parallel Muon | michaelwinczuk | #1031 |
| 1.1185 | Non-Record: SLOT Eval-Time Augmentation on PR #549 SOTA Stack val_bpb = 1.1185 (3-seed mean, std 0.0003) | ~15.9 MB | 8×H100 SXM | AnubhavBharadwaaj | #1084 |
| 1.1186 | Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean) | EthanYangTW | #690 |
| 1.1186 | Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean) | EthanYangTW | #692 |
| 1.1186 | Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean) | EthanYangTW | #693 |
| 1.1187 | Add run_17 8xH100 submission (1.118685, <16MB) | adityakm24 | #1098 |
| 1.1187 | Add run_17 8xH100 submission (1.118685, <16MB) | adityakm24 | #1117 |
| 1.1187 | Submission: 11L XSA4 + TrigramHash + ValueResidual + Legal TTT (val_bpb=1.1187) | adityakm24 | #1118 |
| 1.1187 | Add 11L RotaryFix + LegalTTT + BIGRAM3072 — val_bpb 1.11869 (3-seed m… | Upsalla | #714 |
| 1.1187 | -0.0041 BPB by Reordering Training Data (Curriculum Learning) | abaybektursun | #650 |
| 1.1188 | Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB) | ibarrajo | #1001 |
| 1.1189 | Hymba-11L: SOTA High-Density Takeover (1.1189 BPB) | Prush69 | #852 |
| 1.1190 | [non-record] Sharpness-Aware Minimization (SAM) Inner Loop for Meta-TTT | SPThole | #1601 |
| 1.1190 | GPTQ Int6 + SGD Test-Time Training — A800 1.1190 bpb | ChaosCodes | #610 |
| 1.1190 | Three Breadsticks: 1.1190 BPB | newjordan | #656 |
| 1.1190 | Non-record: 1.1190 BPB — Independent PR #549 Reproduction (10min 8×H100) | manfromnowhere143 | #1069 |
| 1.1190 | Non-record: Aweb Ultimate — 1.1190 BPB (10min 8×H100, independent PR #549 reproduction) | manfromnowhere143 | #1070 |
| 1.1194 | Add Stack Integration + Legal TTT submission package | Taleef7 | #1050 |
| 1.1194 | Add_Maestro_Solar_Protocol_Joeavaib | Joeavaib | #625 |
| ★ | 1.1194 | Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) | abaybektursun | #549 |
| 1.1194 | feat: depth recurrence + cosine recovery TTT | Danishlynx | #697 |
| 1.1194 | Record submission: Poly5 Softcap + BigramHash(3072) + Wider GPTQ-lite… | jimliu741523 | #816 |
| 1.1194 | Record: 1.1194 BPB — v9 Batched Muon + Full GPTQ Random Calib + JEPA Research | NewyorkDev | #1124 |
| 1.1194 | [Submission] Jtss-ux - 1.1301 BPB (10min_16mb) | Jtss-ux | #1269 |
| 1.1195 | Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195) | EthanYangTW | #503 |
| 1.1195 | Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195) | EthanYangTW | #528 |
| 1.1195 | Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195) | EthanYangTW | #529 |
| 1.1196 | Non-record: Negative results from gated multi-order hash n-grams | xiayicheng3-code | #1369 |
| 1.1196 | Non-record: TTT chunk ordering does not improve BPB — negative results from 7 ordering variants | jpfeiffe | #1320 |
| 1.1198 | Non-record: Full GPTQ + XSA-4 + Score-First TTT (3-seed mean 1.1198) | Robby955 | #734 |
| 1.1198 | Non-record: Fused Triton relu^2 kernel — negative result (val_bpb=1.1198) | ibarrajo | #1237 |
| 1.1199 | Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (single seed) | Christopher-Lee-McClendon | #1170 |
| 1.1201 | Non-record: 1.1201 BPB - Shared ValueEmbedding (tok_emb reuse, layers 5-10) + Legal TTT | mradassaad | #768 |
| 1.1204 | Record: 11L LeakyReLU² + Full GPTQ + QAT Alignment (val_bpb: 1.1204) | raahilshah | #535 |
| 1.1207 | GPTQ + Short TTT — val_bpb 1.1207 (seed 1337) | newjordan | #533 |
| 1.1207 | GPTQ + Short TTT — val_bpb 1.1207 (seed 1337) | newjordan | #577 |
| 1.1207 | Vocab 8192 Entropy Optimized: 1.1207 BPB (3-seed mean) | tyrel-beede | #1284 |
| 1.1208 | XSA-11 + GPTQ b64/pd002 — 3-seed mean val_bpb 1.1208 | newjordan | #587 |
| 1.1208 | Add non-record streaming legal TTT late-block submission | simon-marcus | #662 |
| 1.1211 | Non-record: XSA-all + mHC + Full QAT (val_bpb=1.1211) | autocode-rayes | #928 |
| 1.1213 | Non-record: 11L EMA + TTT(20ep,freeze=0) + 15-run ablation study — val_bpb=1.1213 (3-seed) | felipe-parodi | #398 |
| 1.1214 | Record: Legal Score-First TTT + Parallel Muon — val_bpb 1.1214 (3-seed mean) | abaybektursun | #473 |
| 1.1215 | GPTQ + Early QAT + Legal TTT — 3-seed mean val_bpb 1.1215 | newjordan | #508 |
| 1.1215 | GPTQ + Early QAT + Legal TTT — 3-seed mean val_bpb 1.1215 | newjordan | #578 |
| 1.1215 | Non-Record: 11L Parallel Muon + LN Scale + LeakyReLU² MLP3x + Legal TTT — val_bpb 1.1215 (3-seed mean) | aryanbhosale | #838 |
| 1.1216 | Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (val_bpb=1.1216) | EthanYangTW | #410 |
| 1.1216 | Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (val_bpb=1.1216) | EthanYangTW | #415 |
| 1.1216 | Non-record: QNA + SQWA compression thesis (8xH100 SXM) | Abhishek8108 | #975 |
| 1.1217 | Record: Adaptive Precision Embedding Quantization (4-seed mean val_bpb=1.1217) | nothingLiva | #1042 |
| 1.1219 | Full-Training QAT: 1.1219 bpb | autocode-rayes | #836 |
| 1.1219 | XSA-All 11L + LeakyReLU(0.75)² + Aggressive Legal TTT → 1.1219 BPB | teddyoweh | #1092 |
| 1.1220 | Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220) | michaelwinczuk | #1081 |
| 1.1220 | 1.1220 bpb: GPTQ + EMA + XSA-all + BigramHash3072 (11L 512dim) | jorge-asenjo | #1361 |
| 1.1220 | Record: XSA-all + GPTQ + FA3 dtype fix (val_bpb: 1.1220) | G3sparky | #1494 |
| 1.1224 | [Record] Block Attention Residuals + Tuned Legal TTT — val_bpb 1.12242 (8xH100 primary) | kings-crown | #1696 |
| 1.1227 | Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (3-seed mean val_bpb=1.1227) | EthanYangTW | #417 |
| 1.1227 | [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227 | adityakm24 | #1182 |
| 1.1228 | Add non-record 16MB submission: quant-quality-first 1.12276 BPB | ciach | #1080 |
| 1.1228 | Add 11L TTT LoRA submission: SOTA architecture + per-document LoRA te… | ryanadamsai | #617 |
| 1.1229 | Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1229 (3-seed mean) | anthony-maio | #175 |
| 1.1230 | Add non-record 11L XSA4 EMA run (val_bpb 1.12296, over 16MB) | kshitizz36 | #416 |
| 1.1230 | JEPArdy! Non-Record Submission - JEPA + Leader-Stack - val_bpb 1.1230 | simon-marcus | #1243 |
| 1.1231 | Record: 11L + Tight SWA + VE128 + Partial RoPE + LN Scale + TTT (val_bpb: 1.1231) | ElliotSlusky | #388 |
| 1.1231 | Frequency-Weighted Embedding Quantization (1.1231 BPB) | pattern4bots | #898 |
| 1.1231 | Non-record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 control (val_bpb=1.1231, 8xH100 verified) | AbhisekBasu1 | #429 |
| 1.1233 | Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) | signalrush | #414 |
| 1.1233 | 5 novel architecture ablations on SOTA baseline | ssatia | #584 |
| 1.1233 | [WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650) | gthgomez | #682 |
| 1.1234 | Add non-record 10min submission: 11L XSA4 + EMA + GPTQ + FA3 (1.12336724) | NewyorkDev | #636 |
| 1.1234 | Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1234 | anthony-maio | #657 |
| 1.1236 | Late Training Replay + EMA + GPTQ-lite (val_bpb=1.1236, 2-seed, no TTT on eval) | newjordan | #445 |
| 1.1239 | Notable Non-Record Submission: 1.1239 BPB - 106.2M Binary Asymmetric U-Net + NeoMuon + 4xrelu²MLP + Smear + Fact Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8 + Bit-packing LZMA + Stride-16 Eval - 2h | CiprianFlorin-Ifrim | #641 |
| 1.1240 | Submission: 11L EMA + GPTQ-lite + Int6 (val_bpb: 1.1240) | Dhruba531 | #710 |
| 1.1240 | Non-record: GQA + LZMA + SLOT eval optimization (val_bpb=1.1240) | ibarrajo | #1249 |
| 1.1243 | Record: 11L + EMA + Tight SWA + QAT0.15 + VE128 + Partial RoPE + LN Scale (val_bpb: 1.1243) | newjordan | #401 |
| ★ | 1.1246 | Record: 11L + Tight SWA + Shared VE128 + Partial RoPE + LN Scale + XSA4 (val_bpb: 1.1246) | unnir | #374 |
| 1.1247 | Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247) | greqone | #394 |
| 1.1247 | Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean) | abaybektursun | #399 |
| 1.1247 | REHA-DEQ-WSE: Deep Equilibrium with Weight Synthesis for Parameter-Efficient Language Modeling | sohv | #1323 |
| 1.1247 | Publish clean prune baseline 1.12470947 as non-record package | resouer | #1058 |
| 1.1248 | qat + ttt + value embeddings | bopmite | #218 |
| 1.1248 | Record: 11L Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1248) | jfprincz | #315 |
| 1.1248 | Exploratory: PR315-derived candidate and looped-depth gate | Divyesh-Thirukonda | #453 |
| 1.1249 | [4090 Reproduction] Achieve 1.1249 val_bpb (Note: 18KB over limit) | samchill666 | #1717 |
| 1.1250 | Record: DominationV3 + GPTQ-lite + TTT25 (mean val_bpb=1.1250, 3 seeds) | yesbhautik | #64 |
| 1.1253 | Non-Record: 11L Parallel Muon + LeakyReLU² MLP3x + Legal TTT (val_bpb 1.1253) | aryanbhosale | #754 |
| 1.1254 | Record: 11L XSA+EMA+TTT, sliding val_bpb=1.1254 (3-seed mean 1.1256) | alertcat | #338 |
| 1.1257 | Non-record: Negative results & insights from 24hrs on 8xH100 | charmquark1984 | #375 |
| 1.1257 | Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) | dannywillowliu-uchi | #379 |
| 1.1259 | Add competitive 8xH100 run package (1.1259 bpb) | adityakm24 | #1066 |
| 1.1261 | PP12: Bayesian posterior packets + selective gating (1.1261 BPB) | okezue | #1043 |
| 1.1264 | Non-record: GQA + LZMA + Selective Pruning (val_bpb=1.1264) | ibarrajo | #1248 |
| 1.1266 | sp4096 + 10L 3.5x MLP + GPTQ + TTT (1.1266 BPB) | Idan3011 | #1431 |
| 1.1268 | New SOTA: 1.12676 BPB - 11L XSA-all(11) + GPTQ-lite + EMA + Late QAT | gowtham0992 | #478 |
| 1.1269 | 11L VRL + Parallel Muon + Legal TTT v2 (val_bpb=1.1269, non-record) | ADIITJ | #1016 |
| 1.1270 | Record: 11L Tight SWA + Partial RoPE + LN Scale + XSA4 (val_bpb: 1.1270) | sadeghja1070 | #564 |
| ★ | 1.1271 | Record: 11L XSA + EMA + Int6 MLP3x + WD=0.04 (val_bpb: 1.1271) | jfprincz | #287 |
| 1.1271 | Non-record: Custom serialization replacing torch.save + zstd-22 | joyceyan | #1649 |
| 1.1276 | Record: BESE Tokenizer 287 vocab — 1.1276 BPB | mrbese | #1327 |
| 1.1280 | 10-min record: 13L int4 MLP + qTTT + QAT Precompile + ANS Hybrid (val… | yunoshev | #1683 |
| 1.1284 | Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal) | sseanliu | #318 |
| 1.1284 | Research: Why Novel Architectures Fail at 16MB — Throughput-Quantization Co-optimization | sseanliu | #831 |
| 1.1287 | Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287) | dentity007 | #406 |
| 1.1289 | Non-record: Scylla Tokenizer Byte Accounting Audit — Sub-1.0 Was a Measurement Error | andrewbaggio1 | #1271 |
| 1.1290 | Add non-record pre-TTT anchor submission | amrayach | #1101 |
| 1.1295 | Record: Sponge Bath — TTT 8ep eval-only improvement (val_bpb: 1.1295) | newjordan | #390 |
| 1.1296 | Record: 11L CANON-AC(last5)+DeltaGate Report (Humble Record Attempt, val_bpb: 1.1296) | chanwoo-park-official | #400 |
| 1.1299 | Record: 11L Tight SWA + VE128 + XSA4 + TTT (3-seed mean val_bpb=1.1299) | kasimte | #455 |
| 1.1303 | Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303 | timowhite88 | #254 |
| 1.1303 | Non-record: 11L LeakyReLU² + EMA + LZMA Int6 (val_bpb: 1.1303, 2-seed mean) | htrung1105 | #1311 |
| 1.1307 | Record: 11L + Efficient Partial XSA (val_bpb: 1.1307) | unnir | #265 |
| 1.1307 | Non-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes | Jaksenc | #1475 |
| 1.1309 | Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309) | parinzee | #493 |
| 1.1311 | Record: 11L XSA4 + EMA + LoRA TTT + Partial RoPE + dim480 — val_bpb 1.13112 (3-seed) | dentity007 | #1127 |
| ★ | 1.1318 | 11-Layer Int6 + WD=0.04 + SWA + FA3 (val_bpb: 1.1318) | jfprincz | #198 |
| 1.1320 | Record: 12L Gradient-Guided Quant + Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1320) | saml212 | #332 |
| 1.1320 | Record: 11L Full Stack + XSA4 + Tight SWA + Late QAT (val_bpb=1.1320) | joelnishanth | #383 |
| 1.1324 | Non-record: Depth Recurrence + Int7 Mixed Quant — val_bpb 1.1324 (3-seed mean) | iverbovoy | #1453 |
| 1.1324 | Record: 11L + XSA4 + EMA + Late QAT + GPTQ-lite (1.1325 BPB) | pragnyanramtha | #531 |
| 1.1326 | Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB) | JoeProAI | #861 |
| 1.1326 | Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run) | 0xjaishy | #223 |
| 1.1327 | Record: 7L MLP3x + BigramHash + SmearGate + TTT 5ep (mean val_bpb=1.1327) | sofiabod | #489 |
| 1.1328 | Submission: 11L NTK-RoPE + FA3 + Batch524K + XSA4 + EMA (val_bpb=1.1328) | signalrush | #369 |
| 1.1329 | Non-record: Negative findings on codebook quantization, magnitude pruning, multi-token prediction, embedding factorization | mrdavtan | #212 |
| 1.1330 | Non-record: 11L MLP3.5x LeakyReLU(0.5)^2 + Full SOTA Stack (mean val_bpb=1.1330, 8xH100) | aryanbhosale | #344 |
| 1.1330 | Non-record: 11L MLP3.5x LeakyReLU(0.5)^2 + Full SOTA Stack (mean val_bpb=1.1330, 8xH100 SXM) | aryanbhosale | #635 |
| 1.1334 | Non-Record: BPB 1.1334 — 7000-Step Training + Mixed Int6/Int8 Quantization + Legal TTT | Christopher-Lee-McClendon | #598 |
| 1.1335 | Non-record: Verifily Three-Tier Token Weighting + DCLS Salience (SP1024, 1.1335 BPB) | arsenis-cmd | #1634 |
| 1.1336 | Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1336 (15.59 MiB) | JoeProAI | #1040 |
| 1.1345 | 11L MLP3x + Int6 QAT + XSA + EMA + BigramHash + FA3 (val_bpb 1.1345) | tmustier | #359 |
| 1.1346 | Track 10min_16mb: PR #287 family rerun at 585s wallclock (mean val_bpb=1.1346) | tmustier | #483 |
| 1.1347 | Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request) | Christopher-Lee-McClendon | #1166 |
| 1.1349 | SOTA Submission (1.1349 BPB) by weywey [10min_16mb track] | Upsalla | #646 |
| 1.1349 | Track A: 11L U-Net + BigramHash + SmearGate + Partial RoPE + QAT (1.1349 bpb) | Omrigotlieb | #1086 |
| 1.1349 | Non-record: Online Hessian GPTQ (val_bpb=1.1349) | ibarrajo | #1251 |
| 1.1352 | Non-record: Fixed Bank QAT + XSA5 + Label Smoothing (1.1352) | suchitj2702 | #667 |
| 1.1354 | Record: 11L EMA + BigramHash(12288) + Mixed Int5 + FA3 (1.1354) | simonbissonnette | #466 |
| 1.1354 | Record: 11L + Partial XSA + TTT + BatchOpt (val_bpb=1.1354) | ibarrajo | #290 |
| 1.1354 | Non-record: 1.1354 BPB — 10L TTT 22ep AdamW Cosine + LeakyReLU(0.5)² + TrigramHash | bigbag | #562 |
| 1.1355 | The Frugendorff: Recursive Weight Sharing for Transformer Compression (1.1478 BPB, 15.19MB) | newjordan | #579 |
| 1.1356 | Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1356 (15.60 MiB) | JoeProAI | #1041 |
| 1.1357 | Record: 11L XSA4 + EMA + Batch524K + zstd fallback (val_bpb: 1.1357) | dennisimoo | #307 |
| 1.1360 | Record: 11L XSA6 + Warmdown3000 + QAT@0.30 (val_bpb=1.1352, 2-seed mean) | 0xNoramiya | #695 |
| 1.1361 | 11L + XSA4 + EMA(0.997) + seq2048 + Int5-MLP + MuonWD=0.04 + LateK-FP16 | val_bpb=1.1361 | HyperPotatoNeo | #372 |
| 1.1364 | Record: 11L Backout + Int6 + SWA (val_bpb: 1.1364) | sheeki03 | #339 |
| 1.1364 | Record: Dynamic Eval + TTT on SOTA Pipeline (val_bpb=1.1364) | translatingthename | #397 |
| 1.1364 | Non-Record: Ouroboros — Crawler Architecture Research (1.1364 BPB) | newjordan | #1308 |
| 1.1365 | 10L XSA + EMA + Partial RoPE + LN Scale (val_bpb: 1.1365) | ofirkris | #458 |
| 1.1365 | 11L + Hadamard Rotation + VE128 + cuDNN SDPA (val_bpb: 1.1365, 3-seed mean) | EaCognitive | #586 |
| 1.1366 | 10L XSA + EMA + Partial RoPE + LN Scale (val_bpb: 1.1366) | ofirkris | #452 |
| 1.1370 | 10L XSA + LeakyReLU² + Partial RoPE (val_bpb=1.1370) | parinzee | #434 |
| 1.1371 | Non-record: EMA+SWA Tight Averaging with Fused TTT LoRA + Sliding Window (1.1371 BPB) | yunoshev | #1366 |
| 1.1372 | Crawler Transformer 3f+2cx2 + SP8192 + SDClip + Post-Quant TTT — val_bpb 1.1372 | Tonyy1977 | #1579 |
| 1.1373 | Ouroboros — 1.13727008 val_bpb (seed 444) | newjordan | #1283 |
| 1.1374 | Record: val_bpb: 1.14020 [tested 3x on 8xh100] | andrewgcodes | #267 |
| 1.1381 | Non-record: val_bpb=1.1374, FA2+SWA adaptation of Farnsworth | charmquark1984 | #281 |
| 1.1381 | Non-record: v6.2 Phase 5a SOTA-trivial stack (3-seed re-run @66% = 1.138112; TTT 1.204 not competitive) | sisegod | #1465 |
| 1.1382 | Submission: DominationV2 + BOS-Reset Bigram Cache + TTT (val_bpb=1.1382, 3-seed mean) | shouryamaanjain | #958 |
| 1.1387 | Record: 1.1387 BPB — 11L LeakyReLU² + Early QAT@0.5 + GPTQ-lite + EMA | 0xadvait | #979 |
| 1.1387 | Non-Record: BPB 1.13872 — LeakyReLU(0.5)² + Per-Layer LR Legal TTT (3 seeds) | Christopher-Lee-McClendon | #537 |
| 1.1388 | Submit Int6 QAT parameter-golf entry | malc3om | #403 |
| 1.1396 | Record: 11L Gradient-Guided Adaptive Quant + EMA + Sliding Eval (val_bpb=1.1396) | albertorkive | #422 |
| 1.1399 | Record: 11L XSA + EMA + Int5-MLP (val_bpb=1.1399) | Mapika | #349 |
| 1.1399 | Record: 11L Next-Gen Stack + Custom Kernels, val_bpb=1.1399 | anthony-maio | #376 |
| 1.1400 | Record: 11L Int6 + SmearGate + Batch Optimization (val_bpb=1.1400) | saml212 | #236 |
| 1.1400 | Feature/sota optimizations | adityagupta26 | #358 |
| 1.1400 | feat: Ultimate SOTA submission - 10L Model, Mixed Int6 QAT, and TTT/LoRA Evaluation | adityagupta26 | #361 |
| 1.1401 | Record: 11L XSA + EMA + TTT + Partial RoPE + LN Scale — val_bpb=1.1401 | mrdavtan | #371 |
| 1.1402 | QAT x SWA Ablation: SWA sabotages QAT (-3.64 mBPB, 3-seed validated) | alexanderaperry-arch | #989 |
| 1.1403 | [Record] Stride-32 + Warmdown/Muon Tuning on SOTA #1: mean val_bpb=1.1403 | haikosys | #274 |
| 1.1407 | 12 layers GPT | MLP_MULT reduction | VE and BIGRAM modifications | rubenbalbastre | #845 |
| 1.1407 | Record: 1.1407 BPB — LeakyReLU^2 + Delayed QAT + Score-First TTT | Dhenenjay | #1087 |
| 1.1412 | Record: Unified Attention + FA3 + Legal TTT (val_bpb=1.1412, 3-seed) | VirajDeshwal | #1202 |
| 1.1412 | 12L XSA-all + Partial RoPE + Batch 786K (1.1412 BPB, 13.5 MB) | KevinChunye | #1630 |
| 1.1412 | [Non-record] Universal Transformer Depth Recurrence INT6 | thestbobo | #1640 |
| 1.1417 | Non Record: Add PPM heuristic for test time learning | AnirudhRahul | #511 |
| 1.1418 | Non-record: 27M params at Int5 QAT / train larger, quantize harder (val_bpb=1.1418) | cmcdnd | #469 |
| 1.1418 | Non-record: VR + GA + Late QAT + Full GPTQ — 1.1418 BPB, 15.7 MB | anantdgoel | #601 |
| 1.1422 | Add non-record 4xH100 10L Int5-MLP submission | ReNothingg | #602 |
| 1.1425 | Non-record: 11L + 30-Epoch Legal TTT (BPB 1.14252) | Christopher-Lee-McClendon | #526 |
| 1.1426 | Non-record: QAT & EMA negative results on SOTA stack (val_bpb=1.1426) | MultiFe22 | #360 |
| ★ | 1.1428 | Record: 10L Int5-MLP + BigramHash(10240) + SWA(0.4) + WD=0.04 (val_bpb=1.1428, mean 3 seeds) | thwu1 | #180 |
| 1.1428 | Non-record: 4090 single-GPU ablations on ValCalib GPTQ + XSA stack (partial logs) | Wolfie8935 | #1226 |
| 1.1428 | Value Residual + Gated Attention + XSA + EMA + AdamW TTT — val_bpb pending H100 | sahiee-dev | #430 |
| 1.1428 | [track_10min_16mb] 50-Epoch Cosine LoRA TTT + SOTA (10L Int5/Int6 BigramHash SWA) — Atharva Date (ADIITJ) | ADIITJ | #467 |
| 1.1428 | Record: 11L NonTTT VR+GA MixedInt5/6: val_bpb=1.1428 (3-seed, 8xH100) | Asukabot0 | #516 |
| 1.1428 | Add submission: 10L Enhanced with BigramHash(12240) + SOTA techniques | instax-dutta | #563 |
| 1.1428 | Depth Recurrence (3+3 x 2 loops) + HW Optimizations | maorinka | #648 |
| 1.1428 | Non-record: Technique Taxonomy — Tier List, Interaction Effects, and BPB Verification Tools | robbiebusinessacc | #891 |
| 1.1428 | Non-record: Technique Taxonomy — Tier List, Interaction Effects, and BPB Verification Tools | robbiebusinessacc | #892 |
| 1.1431 | Bigram-Aware Context Modeling with Mixed-Precision Quantization (val_bpb: 1.1431) | CREVIOS | #443 |
| 1.1431 | Bigram-Aware Context Modeling with Mixed-Precision Quantization (val_bpb: 1.1431) | CREVIOS | #447 |
| 1.1431 | Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431 | SergheiBrinza | #1205 |
| 1.1433 | 12L Int5-MLP + SmearGate + BigramHash + SWA (val_bpb 1.1433) | unixmadtoonslab | #76 |
| 1.1434 | Add non-record 16MB submission: Vocabulary1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits | shram86 | #1474 |
| 1.1436 | [Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436) | sseanliu | #303 |
| 1.1441 | Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean) | iverbovoy | #1384 |
| 1.1442 | Record: 11L XSA4 + EMA + TTT + Int6 MLP3x (val_bpb=1.1442) | chris-buckley | #317 |
| 1.1443 | Non-record RTX4060Ti 11L LeakyTTT 24h local (1.1443 BPB) | monkeyKingProgrammer | #1244 |
| 1.1444 | Non-record: LeakyReLU(0.5)^2 on SmearGate + BigramHash + Int6 stack (1.1444 bpb) | oidebrett | #1256 |
| 1.1444 | Non-record: 11L DepthRec PolarNS SWA | anderamondarainh-stack | #1661 |
| 1.1444 | Submission/qat bigram12k stride32 | EthanYangTW | #348 |
| 1.1446 | Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) | Christopher-Lee-McClendon | #461 |
| 1.1448 | Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB) | xuafeng | #306 |
| 1.1448 | submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb) | BhatiaUday | #884 |
| 1.1450 | Submission TrigramHash + PartialRoPE + HeadTemp + stride32 (val_bpb: 1.1450)and | Ananddna | #327 |
| 1.1450 | Non-record: GDN Hybrid (E2E TTT / State-Space Model) — val_bpb 1.14502 | andrewbaggio1 | #1479 |
| 1.1451 | 11L + LN Scale + BigramHash 3072x112 + GPTQ: val_bpb=1.1451 | jayzuccarelli | #1675 |
| 1.1452 | Non-record submission: 11L XSA4 + EMA + BigramHash3072 + LZMA (1.1452 BPB) | Buld1n | #1386 |
| 1.1453 | feat: parameter golf v15b - SWA tuned (1.1453 val_bpb) | aktasbatuhan | #574 |
| 1.1454 | WIP: Shared-transformer + warmdown-aligned training (not final submis… | leofeasby | #420 |
| 1.1454 | Non-record: Shared-weight transformer with extended warmdown (1.1454 val_bpb) | leofeasby | #470 |
| 1.1454 | Progressive Depth + Hedge Mixer — val_bpb 1.1454 | iverbovoy | #856 |
| 1.1454 | Record: LoRA TTT on GPTQ — val_bpb TBD (10min_16mb) | DilpreetBansi | #1457 |
| 1.1455 | 11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB) | stukenov | #264 |
| 1.1455 | Non-record: Reproduction of SOTA #1 (SmearGate+BigramHash+Int6+SWA) on RunPod 8xH100 | AbhayAnandUCSD | #1071 |
| 1.1456 | Non-record: MoE exploration + multi-bit quantization analysis | imyesung | #480 |
| ★ | 1.1458 | Record: Int6 MLP3x + SmearGate + BigramHash + MuonWD + SWA (mean val_bpb=1.1483) | raahilshah | #162 |
| 1.1460 | Non-record: Focal Loss (gamma=2.0) — val_bpb=1.1460 | ibarrajo | #1233 |
| 1.1461 | Non-record: AR Self-Generated GPTQ Calibration (val_bpb=1.1461) | ibarrajo | #1234 |
| 1.1462 | Add Looped Transformer Design non-record submission (non tuned) | Aum08Desai | #325 |
| 1.1464 | Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb) | harborglowvintage-oss | #451 |
| 1.1464 | Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb) | harborglowvintage-oss | #665 |
| 1.1464 | Record: 12L RecycledCore Int5 — val_bpb 1.1464 (seed 1337) | shivangbaveja | #1573 |
| 1.1465 | Non-record: LLaDA-MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline) | agalimova | #1100 |
| 1.1465 | Non-record: MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline) | agalimova | #1106 |
| 1.1465 | Non-Record: HybridQuantGPT v6.1 H100 + Aggressive SLOT (steps=100, 3-seed 1.146523) | sisegod | #1456 |
| 1.1466 | Record: 11L Int5-All + XSA5 + EMA + 10% Pruning (val_bpb=1.1466) | trasnake87 | #389 |
| 1.1466 | Non-record: 11L mixed int5/int6 + working QAT + TTT (val_bpb=1.1466) | vytautas-bunevicius | #421 |
| 1.1466 | Record: 12L + Catalytic Residuals + BigramHash(10240) + SWA + Late QAT (val_bpb=1.1466, mean 3 seeds) | zachgoldfine44 | #450 |
| 1.1470 | [Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB) | mkenney2 | #1245 |
| 1.1470 | Non-record: XSA F.normalize fix + byte-shuffle/brotli + Muon WD as compression knob | Bananakin1 | #1709 |
| 1.1472 | Record: 11L, int6+zstd, decoupled WD (val_bpb = 1.1472) | devin-cog | #179 |
| 1.1473 | Non-record: Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb | mradassaad | #1643 |
| 1.1473 | Non-record: Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb | mradassaad | #1644 |
| 1.1476 | Submission: 12L Int5-MLP BigramHash10K EMA (1.1476 BPB) | Skytuhua | #592 |
| 1.1477 | Non-record submission: BigramDim160 + 10% Prune + SWA (1.14767 bpb, 2 seeds) | bryjudy | #637 |
| 1.1477 | [Record Submission] QAT Int5/Int6 + Backout + U-Net Skips + BigramHash(10240) + SWA50 — val_bpb=1.1477 | gowtham0992 | #295 |
| 1.1478 | Results of 2026-03-23_MixedQAT_Int5MLP_Int6Attn | StolbaJ | #709 |
| 1.1478 | Record: 11L Int6 QAT + SmearGate + OrthoInit + SWA + TTT (val_bpb=1.1478) | yahya010 | #150 |
| 1.1478 | The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB) | newjordan | #498 |
| 1.1478 | The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB) | newjordan | #499 |
| 1.1478 | Pre-Enrichment + EMA-GPU + SmearGate + XSA4 (val_bpb=1.1478, … | Idan3011 | #996 |
| 1.1480 | Record: 11L Int6 QAT + SmearGate + SWA + SAM: 1.1480 BPB (3-seed mean) | baudrillardsgh0st | #194 |
| 1.1483 | Add non-record 10min/16MB submission: Wavelet-Lite PR549 Parallel Muon (1.1483) | bro4all | #680 |
| 1.1487 | 10L MLP3x + BigramHash(2048) + SWA + Stride-32: 1.1487 BPB | Rhodrium | #331 |
| 1.1488 | Non-record: 11L Int6 QAT + SmearGate + SWA(0.4) + WD=0.04 (3-seed mean val_bpb=1.1488) | dentity007 | #385 |
| 1.1489 | Record: 10L Int5-MLP3x BigramHash4096 SlidingEval — mean val_bpb 1.1489 | suchihype | #583 |
| 1.1490 | Weight Entropy Regularization: Improved SWA Averaging (+0.028 BPB) | mer2234 | #459 |
| 1.1492 | Non-record: Asymmetric 1/10 Split — 1.1492 pre-quant BPB on 8xH100 (one-line change) | ranausmanai | #1275 |
| 1.1493 | SpotlightLFB + Aux-Int6 Compression — 1.1493 BPB (3-seed mean) | ymrohit | #1142 |
| 1.1497 | Record: 11L Int6+Zstd MLP3x SmearGate BigramHash OrthoInit MuonWD EMA (mean val_bpb=1.1497) | mkenney2 | #362 |
| 1.1497 | Record: Batch-Optimized 524K + Warmdown 4000 (val_bpb 1.1497) | shikhar1729 | #364 |
| 1.1497 | Submission: SP8192 + Depth Recurrence + Muon 0.99 (1.1497 pre-quant BPB) | DevelopedByAnurag | #1739 |
| ★ | 1.1502 | Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502) | aruniyer | #86 |
| 1.1502 | Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502) | baudrillardsgh0st | #192 |
| 1.1502 | nGPT on the Hypersphere: Making Normalized Transformers Work at 16MB (Research) | DbBested | #1108 |
| 1.1507 | Record: Int6 STE + SmearGate + Seq2048 + OrthoInit + RoPE50K + SWA/100 (mean val_bpb=1.1507) | dexhunter | #206 |
| 1.1507 | [10min/16MB] AWQ + Cyclic Momentum + ReLU² + 11L Shared — 1.1507 bpb | SPThole | #623 |
| 1.1507 | 10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB) | Bortlesboat | #694 |
| 1.1508 | Record: 10L d=512 Int5-MLP Int6-Attn sp1024 (val_bpb=1.1508) | LoquiAuris | #465 |
| 1.1509 | [Non-Record] XSA-all-layers + VRL + bigram3072 + lzma9 — 1.1509 bpb, AdamW TTT findings | Hilo-Hilo | #1045 |
| 1.1510 | Non-record submission: 1.15 BPB in 16MB (GPTv3) | LappyG | #1068 |
| 1.1511 | FP8 + Arithmetic Coding + SWA (1.1511 BPB) | cruz-andr | #538 |
| 1.1518 | SmearGate + BigramHash + Int6 + SWA + U-Net Skips (1.1518 BPB) | integrate-your-mind | #289 |
| 1.1520 | Non-record: 11L int5/int6 + XSA + online TTT w/ decay prior (single-run val_bpb=1.1520) | JackYoung27 | #302 |
| 1.1520 | Non-record: Knowledge Distillation - A Negative Result (val_bpb=1.152) | fielding | #1029 |
| 1.1521 | Non-record: TurboQuant mixed-precision int4/int5 (val_bpb=1.1521) | ibarrajo | #1238 |
| 1.1522 | Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522) | harsha-gouru | #477 |
| 1.1522 | Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522) | harsha-gouru | #482 |
| 1.1522 | Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522) | harsha-gouru | #485 |
| 1.1524 | Submission: OrthoInit + Int6 MLP3x + SmearGate + BigramHash (val_bpb: 1.1524) | jfprincz | #164 |
| 1.1526 | PROTEUS v9 — 11L INT6 + single-epoch LoRA TTT (mean val_bpb=1.1526, 3 seeds) | MatoTeziTanka | #633 |
| 1.1526 | Non-record: Mamba-3 Hybrid + Full Hessian GPTQ + Late QAT — val_bpb 1.1526 | mradassaad | #1355 |
| 1.1527 | Non Record: Partially Random MLP | meinlebenswerk | #1228 |
| 1.1531 | BESE v5.3: Novel 288-token tokenizer (non-record 16MB) | mrbese | #1621 |
| 1.1531 | Record: BESE 288-vocab Novel Tokenizer — 1.1531 BPB (3-seed mean) | mrbese | #1666 |
| 1.1532 | Add non-record shared-weight Frugendorff submission | siddhantparadox | #773 |
| 1.1532 | Record submission : Int6 + MLP 3x + Flash Attention 3 + NorMuon, val_bpb = 1.1532 | tamoghnokandar | #173 |
| 1.1532 | Non-record submission: Depth Recurrence + Legal Score-First TTT (10L, 1.1532 BPB) | Christopher-Lee-McClendon | #456 |
| 1.1533 | Non-record: KNN Hidden State Retrieval — Scale Deception from Weak to Strong Models (8xH100) | himanshudongre | #1259 |
| 1.1536 | [Non-Record] Competitive Baseline: 10L GQA + Mixed Int6/Int8 + SWA + Seq4096 (val_bpb=1.1536) | rithunkp | #1065 |
| 1.1537 | Add ContextFuse-2048-BigramSmear submission | Julz19 | #174 |
| 1.1538 | Add non-record unlimited-compute 11L LeakyTTT 16h local RTX 4060 Ti run | monkeyKingProgrammer | #1008 |
| 1.1539 | Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1539) | unnir | #135 |
| 1.1539 | [Record Submission] - 74.3M Ternary U-Net Transformer (v2 - Continuation from #PR640) | CiprianFlorin-Ifrim | #920 |
| 1.1541 | Non-record: 12L Int5-MLP + Int6-Attn mixed quantization, val_bpb=1.1541 | alertcat | #219 |
| 1.1541 | Record: Int6 + MLP 3x + NorMuon + SmearGate + BigramHash + OrthoInit + Sliding Window, val_bpb=1.1541 | MatthewHRockwell | #230 |
| 1.1547 | Non-record: 11L XSA-All + EMA + Legal GPTQ on 1xH100 PCIe (1.1546 bpb) | Rtx09x | #1353 |
| 1.1548 | Non-Record: 11L Low-Rank on Q192 (val_bpb=1.1548) 14.7MB in decimal | JayCheng113 | #215 |
| 1.1550 | Record: Long Context + All Optimizations submission | chinesepowered | #166 |
| 1.1551 | LAWA-EMA frontier fork (pr198 base, SWA -> LAWA val_bpb=1.1551) | machdragon | #201 |
| 1.1552 | feat(arch): Mish² Activation & PyTorch Native SDPA GQA Core (1.155 BPB) 8xH100 | demirelo | #653 |
| 1.1554 | Add PR114 RunPod H100 SXM non-record submission | greqone | #252 |
| ★ | 1.1556 | Record: Mixed Quant Int6/FP16 + SmearGate + OrthoInit + MLP 3x + Sliding Window, val_bpb=1.1556 | aquariouseworkman | #65 |
| 1.1558 | Record: 1.1558 BPB — 11L U-Net + Catalytic + SwiGLU + SW64 | skarakulak | #507 |
| 1.1565 | 11L XSA + SmearGate + BigramHash + SWA (mean val_bpb=1.1565, 3 seeds) | mahsumaktas | #186 |
| 1.1565 | 11L XSA4 + SmearGate + BigramHash + SWA + RoPE50K (mean val_bpb=1.1565, 3 seeds) | mahsumaktas | #333 |
| 1.1567 | Non-record: Focal Loss for LM Pretraining — 1.1567 int8 BPB on RTX 4000 Ada (3-line change) | ranausmanai | #1380 |
| 1.1568 | Add 2026-03-20 11L dense-lexical submission candidate | ajkpersonal | #207 |
| 1.1568 | Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568) | ajkpersonal | #208 |
| ★ | 1.1570 | Record Submission: 1.1570 BPB - 73.7M Ternary U-Net + NeoMuon + 4x relu²MLP + Factored Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8QAT + Bitmask-LZMA + Stride-16 Sliding | CiprianFlorin-Ifrim | #640 |
| 1.1570 | Fix: move Ternary UNet submission folder from track_10min_16mb to track_non_record_16mb | janwww | #730 |
| 1.1573 | Non-record: 11L XSA + Score-First LoRA TTT (1.1573, 1xH100) | swapp1990 | #1090 |
| 1.1574 | Record: val_bpb=1.1574 — Int6 + MLP 3x + selective precision + optimized long-context training | saml212 | #114 |
| 1.1574 | submission: 10L Int5-MLP + Aggressive Warmdown (WD=20000) — targeting <1.14 bpb | outsourc-e | #365 |
| 1.1574 | Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM) | shivnarainms22 | #366 |
| 1.1574 | 12L INT4 bQAT + Value Embeddings — val_bpb 1.1588 | SoHarshh | #1009 |
| 1.1574 | 11L INT6 XSA-all + EMA + VE — ttt_bpb 1.1487 | SoHarshh | #1216 |
| 1.1575 | Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575) | dentity007 | #273 |
| 1.1576 | Non-record: Legal Neural-Only No-TTT Alt (8xH100) val_bpb=1.1576 | aamodbhatt | #947 |
| 1.1580 | record: 1.158 | krammnic | #106 |
| 1.1580 | Non-record: CoDA-GQA Differential Attention — First Differential Attention Submission (val_bpb=1.1580) | anthony-maio | #932 |
| 1.1585 | Non-record: Gaussian per-token loss reweighting — what goes wrong and why (+0.014 bpb) | JulianTang2027 | #1360 |
| 1.1590 | record: 10L d496 WarmDown3500 SWA — val_bpb 1.1590 (1xH100 proxy) | Hilo-Hilo | #901 |
| 1.1591 | Record: 11L XSA4 + EMA + Partial RoPE + Rank-8 TTT Hooks (1.1591 bpb) | Divyesh-Thirukonda | #492 |
| 1.1594 | Record: Int6 MLP3x + STE QAT + Sliding Window (val_bpb=1.1594) | rsavitt | #128 |
| 1.1596 | Add SP4096 11L432 MLP3x Int6+Zstd Momentum99 record (val_bpb=1.1596) | kshitizz36 | #251 |
| ★ | 1.1598 | Record: 10L Int6 QAT + Zstd MLP2.6x Muon0.99 Sliding Window (val_bpb 1.1598) | yahya010 | #63 |
| 1.1598 | Record: Compression-Funded MLP3x (val_bpb=1.1598) | chris-buckley | #191 |
| 1.1601 | Non-record: WiderMLP + FP16 Embed + Stride-32 (val_bpb=1.1601) | ansh-deriv | #222 |
| 1.1601 | 12L rANS + LeakyReLU(0.95)² + Soft XSA (1.1601 BPB, non_record_16mb) | turbo-indubitable | #1215 |
| 1.1602 | feat(record): Int6 STE + NorMuon + SWA + Sliding Window (val_bpb=1.16019) | dexhunter | #156 |
| 1.1603 | Record: Sliding Window Eval, 2048 Vocab Size, fp16 embeddings, SWA, NorMuon, FA3; mean_val_bpb:1.160 | mtybadger | #122 |
| 1.1604 | Cautious Muon + SP4096 + Depth Recurrence — val_bpb 1.1604 (non-record) | X-Abhishek-X | #1381 |
| 1.1605 | Record: Int6 MLP3x + MTP + Sliding Window Eval (val_bpb=1.1605) | seanward | #88 |
| 1.1605 | submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605) | takhir-iota | #99 |
| 1.1606 | Non-record: Legal Neural-Only No-TTT (8xH100) val_bpb=1.1606 | aamodbhatt | #946 |
| 1.1609 | Non-record: 11L Int6 + Online Logit Bias (val_bpb=1.1609) | bopmite | #330 |
| 1.1618 | Int6 MLP3x + Tuned LR + SmearGate + SlidingWindow (val_bpb: 1.1618) | unnir | #102 |
| 1.1622 | record: val_bpb=1.1622, NorMuon + int6 STE + SWA + sliding window | vmfunc | #89 |
| 1.1623 | Record: MLP3x + Int8 Tok Emb + Grouped LZMA + Sliding Window (val_bpb=1.1623) | ChaseWNorton | #160 |
| 1.1624 | Add non-record 11L int6 challenger 8xH100 attempt | JWLBOYCE | #209 |
| 1.1628 | Record: 10L Int5-MLP + SmearGate + BigramHash + Late QAT (val_bpb=1.1628) | chris-buckley | #286 |
| 1.1629 | Record: Pre-Enrichment + Encoder Recurrence + XSA + SmearGate + BigramHash (val_bpb=1.1629) | Idan3011 | #187 |
| 1.1629 | Late STE QAT + Int6 MLP3x + SmearGate + BigramHash + OrthoInit + Overtone + SWA + SGD TTT (int6+zstd-22) | davidpuertolas | #297 |
| 1.1631 | Record/smaller batch sota, val_bpb 1.16314679 (post-quant, int6+zlib, sliding eval) | ankitmaloo | #147 |
| 1.1632 | ArjunAutoResearch: MLP 3x + STE int6 QAT + seq4096 + sliding window. val_bpb 1.1632 | arjun-krishna1 | #66 |
| 1.1634 | Record: SwiGLU + BigramHash + SWA, val_bpb=1.1634 (8xH100 verified) | JoeProAI | #373 |
| 1.1634 | Non-record subimission: RecurrentTiedDepth_8x2_FiLM records | loveless2001 | #552 |
| 1.1639 | Non-record submission: Weight-Tied 6Lx2 d=672 (1.1639 BPB) | yuitokyouni | #1568 |
| 1.1642 | Add MergedTop3_v3 clean 8xH100 record-track submission | hesong0222-dev | #698 |
| 1.1642 | Record: Vocab 4096 + MLP 3x + Sliding Window Eval (mean val_bpb=1.1642, 3 seeds) | saikrishnarallabandi | #123 |
| 1.1645 | [Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645) | sseanliu | #294 |
| 1.1645 | [Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645) | sseanliu | #296 |
| 1.1646 | Non-Record: McGilchrist Register Token — causal cumulative mean + FiLM global context pathway | aramdov | #1022 |
| 1.1648 | Int6+zstd MLP1488 + Sliding Window + QAT + Tuned LR (val_bpb=1.1648) | m0at | #107 |
| 1.1650 | 12L INT4 bQAT + EMA Fix + Deterministic QAT — val_bpb ~1.165 | SoHarshh | #1002 |
| 1.1653 | Add record: 9L MLP3x LeakyReLU(0.5)² QAT Int6+zstd (val_bpb=1.1653) | andreanjos | #929 |
| 1.1659 | Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659 | jfprincz | #70 |
| 1.1659 | Memory Tokens + Mixed Quantization (val_bpb: 1.1659) | sp00mm | #351 |
| 1.1659 | Memory Tokens + Mixed Quantization (val_bpb: 1.1659) | sp00mm | #352 |
| 1.1666 | Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666) | abhishekgahlot2 | #116 |
| 1.1666 | Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666) | abhishekgahlot2 | #137 |
| 1.1667 | Add Nuclear Stack submission: 1.16668 BPB (seed 2884431328) | timowhite88 | #178 |
| 1.1668 | Record: Int6 + Canon ACD (K=3) + Muon WD 0.04 + SWA + Sliding Eval (val_bpb=1.1668) | chanwoo-park-official | #312 |
| 1.1669 | Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669) | baudrillardsgh0st | #170 |
| 1.1670 | Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100) | polarizedfortnite-cpu | #81 |
| 1.1672 | 12L Full-INT4 (MLP + Attn) + BigramHash(4096) — val_bpb 1.1672 | Naazimsnh02 | #305 |
| 1.1682 | Non-record: S4D-Lin SSM Hybrid — Fixing Why Mamba Failed in Parameter… | himanshudongre | #1013 |
| 1.1688 | Non-record: Emergent weight symmetry in QO projections + learnable SymMix | gersh | #1214 |
| 1.1690 | Non-record: 6-Technique Stack — Catalytic Residuals + Value Residual + Gated Attention + BigramHash(10240) + 12L (val_bpb=1.1690) | joshuaswarren | #474 |
| 1.1696 | Recursive Transformer 4B/7L + VE + QAT + TTT — val_bpb 1.1696 (3-seed mean) | Tonyy1977 | #927 |
| 1.1697 | [Non-record] LoRA TTT + HParams (val_bpb=1.16973333) | Mistobaan | #299 |
| 1.1702 | submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702) | trovatochris | #117 |
| 1.1702 | [Non-Record] QAT + NTK-4096 Eval + Cosine Warmdown + Aggressive SWA | crony-io | #324 |
| 1.1704 | Record: Int6 3xMLP + Cosine Warmdown (val_bpb=1.1704) | kvmukilan | #243 |
| 1.1704 | non-record: int6 3xMLP + cosine warmdown (1.1704 bpb) | kvmukilan | #246 |
| 1.1704 | non-record: int6 3xMLP + cosine warmdown (1.1704 bpb) | kvmukilan | #249 |
| 1.1708 | SubSixteen v2: Int6 QAT + MLP 3x + SWA + Sliding Window (val_bpb 1.1708) | TevBenji | #69 |
| 1.1711 | Non Record: GPTQ int7 XSA BigramHash — val_bpb 1.1711 | Rajat123456789 | #1378 |
| 1.1715 | Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715) | yashverms | #418 |
| 1.1719 | Add WaveletWeightedWidenet submission directory with README and metadata | dubthecat | #211 |
| 1.1720 | Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack | anantdgoel | #487 |
| 1.1722 | Add lzma6 submission (1.172 bpb, 10min_16mb) | lee101 | #329 |
| 1.1724 | Non-record: Compression moonshots — 8 negative/marginal findings (Procrustes, SWA smoothness, selective fp16, pruning+zstd) | mrdavtan | #1048 |
| 1.1725 | The Stinky Frost Recipe — 1.1725 BPB | newjordan | #190 |
| 1.1725 | Add non-record EMA and adaptive export exploration | someone114514 | #424 |
| 1.1732 | Add submission: 10L Slide64 Mid6, val_bpb=1.1732 | GLDRoger | #176 |
| 1.1734 | Non-record: LoRA TTT exploration on SOTA base (negative result) | hmlizama | #658 |
| 1.1734 | Non-record: Higher-Rank Output Heads — Standard Tied Head Wins on a Frontier 11L Baseline | albertorkive | #908 |
| 1.1739 | Non-record: 10L FP16-Embed + Warmdown20k | codestrongestx | #381 |
| 1.1744 | Add TTT (Test-Time Training) submission: 1.1767 BPB | timowhite88 | #152 |
| ★ | 1.1748 | Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) | notapplica | #60 |
| 1.1750 | Add 10min/16MB record: skinny RLM seq2048 (int8+zlib val_bpb 1.1750) | k-oconnor | #575 |
| 1.1752 | Int5/Int6+Zstd+MLP3x: mean val_bpb=1.1752 (10L, seq4096, sliding window) | shajalahamedcse | #546 |
| 1.1752 | Record: Int5/Int6+Zstd+MLP3x — mean val_bpb=1.1752 (10L, seq4096, sliding window) | shajalahamedcse | #547 |
| 1.1753 | Record: SP4096 int6+zstd 10L496 overtone+phase sliding (val_bpb=1.1753) | kshitizz36 | #217 |
| 1.1761 | Nightcrawler — 1.176bpb 10mb | newjordan | #1208 |
| 1.1763 | Add non-record submission for full80 recurrence seq1536 20k | shasank0001 | #1345 |
| 1.1764 | Sliding Window + Long-Context Training: val_bpb=1.1764 | saml212 | #96 |
| 1.1768 | Add seq4096 sliding-window fp16 tok coarsen record | takhir-iota | #75 |
| 1.1770 | Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations | ksang123 | #367 |
| 1.1779 | DenseContextQuantTrim 8xH100: 1.1779 val_bpb | IvGolovach | #256 |
| 1.1779 | Add ContextFuse-2048 submission | Julz19 | #143 |
| 1.1787 | Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787) | vishesh9131 | #310 |
| 1.1791 | Non-record: 11L 3x MLP Seq2048 — val_bpb 1.1791 (8xH100 SXM) | Rohan-Abhilash | #1505 |
| 1.1792 | MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA | xinpw8 | #205 |
| 1.1801 | Non-record: AutoResearch Value Embeddings + MLP3x, 1.1801 bpb (1x RTX 4090) | ivanontech | #1139 |
| 1.1801 | Non-record: AutoResearch Value Embeddings + MLP3x, 1.1801 bpb (1x RTX 4090) | ivanontech | #1141 |
| 1.1803 | SP8192 + 9-Layer + Breadcrumb Gating + EMA + Stochastic Depth - 1.1803 BPB (legal) | Unwindology | #1724 |
| 1.1804 | Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) | rarce | #534 |
| 1.1804 | Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) | rarce | #543 |
| 1.1807 | Submission: Atris Labs v8 (audited seed42, clean branch) | keshav55 | #671 |
| 1.1807 | Non-record: Int6 QAT + MLP1472 + SlidingWindow + TTT (val_bpb=1.1807) | lookin-zz | #301 |
| 1.1807 | Record: Atris Labs — 3-seed mean val_bpb=1.1807, 10L MLP3x Int5/Int6 BigramHash SmearGate SWA | keshav55 | #515 |
| 1.1807 | Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x | zeytx | #805 |
| 1.1812 | Add 3xMLP + Mixed Quant + Blockade/Sigma submission (val_bpb: 1.1812) | GMaN1911 | #172 |
| 1.1821 | Non-record: Semantic Tube Regularization — Geometry Improves, BPB Doesn't (Compute–Regularization Tradeoff) | albertorkive | #894 |
| 1.1826 | Non-record: Soft MoE Exploration — Dense Gating Fixes Sparse Router Collapse Under 16MB (WIP, val_bpb=1.1826) | HugoOchoaLP | #660 |
| 1.1828 | [Non-Record] Hymba: Hybrid Attention + Mamba SSM (val_bpb 1.1828) | mkenney2 | #599 |
| 1.1834 | Add non-record 16MB submission: FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ | shram86 | #1447 |
| 1.1834 | Add non-record 16MB submission: FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ | shram86 | #1448 |
| 1.1836 | PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record) | MatoTeziTanka | #95 |
| 1.1839 | 11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839 | carlesonielfa | #457 |
| 1.1844 | Non-record: Linearized Neural Memory + TTT (val_bpb=1.1844) | mihir-s-05 | #182 |
| 1.1854 | Non-record: No-FA3 stack combination — val_bpb 1.1854 (1-seed, 8xH100) | akaiHuang | #1442 |
| 1.1855 | Record: Pre-Enrichment + Encoder Recurrence (val_bpb=1.1855) | Idan3011 | #184 |
| 1.1858 | Record: 8192 Vocab Size, NorMuon, Selective Quantization; 1.186 val_bpb | mtybadger | #78 |
| 1.1864 | Add record: Optimizer Tuning + Sliding Window Eval (val_bpb=1.1864) | andreanjos | #321 |
| 1.1870 | Record: FP16 Embed + Sliding Window Eval + Warmdown Tuning (pending eval) | JoeProAI | #113 |
| 1.1873 | [Non-Record] Hymba-LongContext: 32K context training via hybrid SSM + SWA (1.1873 BPB) | mkenney2 | #914 |
| 1.1874 | Crawler — 8.8MB -1.1874 BPB (3-seed mean, 8xH100, 600s) | newjordan | #1140 |
| 1.1875 | Non-record: Mamba3 Hybrid + GPTQ Long Context (1.1875 BPB) | samquiring | #1268 |
| 1.1876 | Record: sliding eval, FP16 tied embeddings, 10 layers, Muon WD 0.02, overtone init, and phase-transition residual mixing. (val_bpb 1.1876) | peytontolbert | #155 |
| 1.1882 | Preliminary: 11L VRL + Full GPTQ + Parallel Muon + Legal TTT — val_bpb 1.1882 (ADIITJ) | ADIITJ | #960 |
| 1.1884 | Add seq4096 fp16 tok coarsen record | takhir-iota | #74 |
| 1.1888 | 1.1888 BPB via SP-4096 compression + stride-64 sliding window | kshitizz36 | #53 |
| 1.1890 | 11L INT6 + Backward-Looking Per-Document LoRA TTT | haimianbaobao007 | #550 |
| 1.1893 | Non-record: staging profile (LAWA + slide eval) on 8xH100 (val_bpb=1.18926428) | machdragon | #197 |
| 1.1896 | [Non-Record] JEPA Self-Distillation with EMA Target Encoder for Autoregressive LM (val_bpb: 1.19) | Current Noisy/Negative Result | MVPandey | #896 |
| 1.1898 | Non-Record: DG Attention, Differential-Gated Attention with Depth-Scheduled Novelty Encoding: (val_bpb=1.1898) | ddavidgao | #542 |
| 1.1899 | Submission: 10L + Sliding Window eval (mean val_bpb=1.1899) | shajalahamedcse | #221 |
| 1.1903 | Non-record: Byte-level transformer + JEPA auxiliary loss (val_bpb: 1.1903) | jfprincz | #832 |
| 1.1914 | Record: CLASE-Quant adaptive layer quantization (val_bpb=1.1914) | NewyorkDev | #309 |
| 1.1915 | Non-record: Oscillatory Recurrence at Layer 0 (1.1915 BPB, 3-seed) | amabito | #1221 |
| 1.1917 | Non-record: random-map adapter train/eval ablations on 8xH100 | papalino456 | #1164 |
| 1.1920 | Restore non-record submission: 2026-04-08 Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ | shram86 | #1496 |
| 1.1921 | SP8192 Depth Recurrence + Parallel Residuals + TTT (1.1921 BPB) | yu314-coder | #1628 |
| 1.1925 | Record: Quant Quality: val_bpb=1.1925 | ankitmaloo | #142 |
| 1.1925 | Record: Sliding Window Eval (stride=64), val_bpb=1.1925 | mattqlf | #50 |
| 1.1925 | Submission/2026 03 22 Sliding Window + WARMDOWN + AttnRes + PhiSimple (mean 1.1925 BPB) | ikermoel | #500 |
| 1.1928 | docs: add TIPS.md and resolve environment dependency issues (#280, #82, #43) | adityagupta26 | #357 |
| 1.1928 | Non record: Single H100 10 min 1.24 BPB | adityasasidhar | #1547 |
| 1.1929 | Add non-record BigramHash4096 + MLP992 + LR0.08 + Slide64 submission | josusanmartin | #355 |
| 1.1929 | Non-record: SWA and doc-isolated eval ablation — two negative findings at stride=64 | mrdavtan | #199 |
| 1.1932 | Non-record: BitNet Ternary — 65M params in 15.9MB (1.1932 BPB) | chrislovescoding | #666 |
| 1.1933 | Record: 7L MLP3x 4kSeq LR-Tuned (val_bpb=1.1933) | sofiabod | #446 |
| 1.1935 | Non-record: 1x RTX PRO 6000 Blackwell 10L Int5-MLP (1.1935 BPB) | Rohan5commit | #560 |
| 1.1938 | Record: 8192 Vocab, Sliding Window Eval, Selective Quantization; 1.194 val_bpb | saikrishnarallabandi | #92 |
| 1.1942 | Record submission: Distill+IntraLoop SP1024 9x512 (val_bpb=1.1942) | divagr18 | #1623 |
| 1.1946 | Non-record: XSA-All + QK Gain 4.0 + LN Scale — 45 Experiments on 1×RTX 5090 | jainpranjal97 | #1125 |
| 1.1948 | VQ-VAE Weight Compression (non-record track) | WeijieChen2017 | #1335 |
| ★ | 1.1950 | [record bpb=1.195] sliding window + LoRA TTT | samacqua | #77 |
| 1.1957 | Add TTT-LoRA 512d submission (val_bpb=1.1957) | santosh5541 | #157 |
| 1.1957 | Record:Add TTT-LoRA 512d submission (val_bpb=1.1957) | santosh5541 | #159 |
| 1.1957 | Record:Add TTT-LoRA 512d submission (val_bpb=1.1957) | santosh5541 | #161 |
| 1.1962 | Non-record: DepthScale — Parameter-Shared Iterative Transformer (1.1962 BPB) | Lumi-node | #1509 |
| 1.1973 | Sliding Window Eval + Muon6 (val_bpb 1.1973) | beee003 | #169 |
| 1.1974 | Non-record: AutoResearch Batch Optimization — 1.1974 bpb (1× RTX 4090) | ivanontech | #1036 |
| 1.1978 | Merge: Autoresearch/mar28 experiments on 4xH20 | demouo | #1052 |
| 1.1980 | Progressive Depth Training — val_bpb 1.1980 | iverbovoy | #835 |
| 1.1986 | Non-Record Submission: 1.1986 BPB — HybridQuantGPT v6.1 rANS + Legal TTT | sisegod | #1123 |
| 1.1989 | Non-record: MUD optimizer — triangular Gram preconditioning (arxiv:2603.17970) | SelfAnush | #510 |
| 1.1995 | Shallow Blue: BOS-Reset Exact Memory Probe | jxgod | #1478 |
| 1.1996 | Int5 MLP + Int6 Attn + zstd-22, val_bpb 1.1996 | edidisheng | #1059 |
| 1.2005 | Non-record Submission: SwiGLU 3x + Dynamic Wallclock Cosine | yuvraajbains | #799 |
| 1.2006 | Add SmearGate+BigramHash context-repair submission (1.2006 BPB, 15.0MB) | handemanai | #448 |
| 1.2012 | Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012) | khasinski | #37 |
| 1.2012 | Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012) | khasinski | #200 |
| ★ | 1.2014 | New SOTA attempt (val_bpb=1.2014) | spokane-way | #52 |
| 1.2026 | Record: 6L depth minimalism U-Net sliding window - val_bpb 1.2025 | alphastar1111 | #1527 |
| 1.2026 | Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_bpb=1.20262) | aniketio-ctrl | #426 |
| 1.2029 | Non-record: Mixed-Int6 LZMA9 B3072 Warm5000 | sabdulmajid | #1438 |
| 1.2029 | Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029) | ksang123 | #139 |
| 1.2035 | Non-record: 12L Low-Rank Q + QAT (1xH100, pre-quant 1.2035) | SkywardSyntax | #316 |
| 1.2036 | Record: SEQ_LEN=4096 training | lenguyen1807 | #231 |
| 1.2037 | PROTEUS v4 — non-record submission (val_bpb: 1.2037) | MatoTeziTanka | #368 |
| 1.2045 | Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045) | mrdavtan | #151 |
| 1.2050 | Add healing-phase training submission (1.205 val_bpb) | luccifer00 | #1752 |
| 1.2052 | Non-record: QAT ablation — int8 QAT overhead exceeds quantization gap recovery | mrdavtan | #145 |
| 1.2055 | Non-record: Competitive Stack + Phonetic Tokenization Exploration (val_bpb=1.2055, 4xH100) | nalediym | #454 |
| ★ | 1.2058 | SOTA attempt (val_bpb=1.2064) | spokane-way | #49 |
| 1.2058 | Add 1.20 BPB submission with Legal TTT and Calibration (9L/448D) | Vibes-me | #976 |
| 1.2058 | Add 1.20 BPB submission with Legal TTT and Calibration (9L/448D) | Vibes-me | #1038 |
| 1.2064 | Non-record: leader-core valid-eval parity run + 1xH100 proxy screens | simon-marcus | #244 |
| 1.2064 | [Notable Non-Record Submission] To JEPA or Not to JEPA: That Is Le Question (32.8M LeWorldModel Mamba2 Style Text Implementation - 1.2064 BPB ) | CiprianFlorin-Ifrim | #903 |
| 1.2065 | Non-record: Depth Recurrence + XSA + LeakyReLU² (val_bpb 1.2065) | iverbovoy | #784 |
| 1.2066 | Add 1.2066 record: 8L Depth Recurrence by trhgbao | trhgbao | #1472 |
| 1.2067 | [Non-record] Codebooks! - val_bpb 1.2067 (3-seed mean) | mtybadger | #1433 |
| 1.2070 | [Non-record] Scaled Byte-level H-Net matches 4-hour subword-level baseline (H-Net val_bpb = 1.2070) | DariusFeher | #1305 |
| 1.2073 | Record: 1.2073 bpb • 11L gold6 • 8xH100 | pall23-mech | #649 |
| 1.2075 | Non-record: Systematic Hyperparameter Search (val_bpb=1.2075) | nglain | #141 |
| 1.2079 | [Non-Record] LegendreGPT: Legendre polynomial depth parameterization | sergimichi | #1337 |
| 1.2089 | Non-record: Int6 QAT + 11L 512d + Sliding Window, val_bpb=1.2089 | dibdabo | #225 |
| 1.2091 | SwiGLU dim=576 + Sliding Window + Muon WD (1.2091 BPB) | Focus2321 | #163 |
| 1.2092 | Non-record: Add submission track_non_record_16mb/2026-03-23_DepthRecurrent_TTT | SergiuDeveloper | #495 |
| 1.2092 | LeakyReLU + XSA + PartialRoPE + FA3 submission — val_bpb 1.1991 | kjahan | #1427 |
| 1.2093 | [WIP] Record: Hybrid architecture 8L 3:1 GDN/Transformer (val_bpb=1.2093) | phulin | #651 |
| 1.2094 | Non-record: Full Attention + LZMA + small BigramHash (val_bpb=1.2094) | ibarrajo | #1250 |
| 1.2097 | Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735 | Abhishek8108 | #1553 |
| 1.2098 | basic submission improving baseline | elad-simbalista | #1748 |
| 1.2101 | Record: Seq2048 training + eval (val_bpb=1.2101) | ibarrajo | #136 |
| 1.2115 | Add Parameter Golf submission: Vocab768_LinearPhaseInit_GatedXSA_EMA_… | shram86 | #1015 |
| 1.2135 | 11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed) | nickferrantelive | #858 |
| ★ | 1.2139 | Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers) | nanlliu | #39 |
| 1.2151 | Byte-Level Tokenizer-Free Transformer: 1.2151 BPB (beats baseline 1.2244) | seanward | #705 |
| 1.2154 | warmdown-quantization val_bpb = 1.2154 | saml212 | #61 |
| 1.2156 | Record (pending): 92-experiment autoresearch + sliding window eval, pre-quant val_bpb=1.2156 | hydeh3r3 | #85 |
| 1.2160 | NTK Eval + Overtone Init (val_bpb=1.2160) | notapplica | #59 |
| 1.2162 | Mixture of Convolutions (MoC): token-adaptive short convolutions via kernel mixtures | andrewmouldon | #966 |
| 1.2164 | Non-record: ASQU activation, Mixture of Convolutions, BankedLinear | andrewmouldon | #679 |
| 1.2168 | Submit 9L 2xMLP optimized parameter run with val_bpb 1.2168 | Git-Aarya | #736 |
| 1.2174 | Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb) | Jayteare | #1046 |
| 1.2182 | V2 Prototype: SwiGLU + Dropout + MuonWD + MidLayerLoop | starfly-web | #340 |
| 1.2185 | Add BitNet b1.58 Ternary Quantization (non-record submission) | erikqu | #760 |
| 1.2192 | [Non-Record] Single H100 16mb< 1.21bpb | adityasasidhar | #1617 |
| 1.2194 | Aweb Optimized Baseline — 1.2194 BPB | manfromnowhere143 | #181 |
| 1.2196 | Depth Recurrence + Cross-Repeat Skip + Sliding Window Eval | iverbovoy | #148 |
| 1.2196 | Non-record: Annealed Muon 1.58-bit Ternary — val_bpb 1.2196 (8xH100 SXM) | DushyantChetiwal | #1273 |
| 1.2197 | fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197) | chonchiog | #42 |
| 1.2199 | [notable non record] Train-Time Overparameterization: Better Models Through Transient Expansion | andrewmouldon | #1551 |
| 1.2200 | Non-record: 12L Compression-Aware Training Orchestration with ProxQuant | mollahasani | #1357 |
| 1.2201 | 11L MLP2x + LeakyReLU² + Legal TTT (val_bpb=1.2201, 3-seed mean, std=0.0015) | Programmerryoki | #1057 |
| 1.2206 | Non-record: DP tokenizer beats naive baseline but fails to close gap to SP — 47-run controlled study (1.2206 @ 4096) | Stuckertks09 | #1504 |
| 1.2207 | Non-record: 11L PartialRoPE + LNScale + EMA + SWA + TTT (1xH100 107min, val_bpb=1.2207, 15.4MB) | nathon-lee | #334 |
| 1.2207 | [non-record track] BankLinear: cross-layer shared weight bank | andrewmouldon | #1315 |
| 1.2208 | Record: CUTLASS EVT Backward MLP Fusion + Brotli + Turbo-Muon + Memmap | abaybektursun | #1105 |
| 1.2208 | Proposal: Validate ASQU on the March 22 10min/16MB control line | fahmitech | #1247 |
| 1.2219 | [10min/16MB] TrigramHash + EMA-SWA + Int4 QAT — val_bpb 1.2219 | Ashutosh3142857 | #440 |
| 1.2225 | [non-record track] Hierarchical Shared Attention (HSA): multi-level sharing across attention heads | andrewmouldon | #1264 |
| 1.2236 | [non-record track] BankLinear: cross-layer shared weight bank with learned + random mixtures | andrewmouldon | #812 |
| 1.2240 | Modal 8xH100 LowerLR FP16Embed 960 (val_bpb 1.22395) | kiankyars | #45 |
| 1.2244 | [Partial submission] naive baseline + dispersion loss | ChenLiu-1996 | #34 |
| 1.2244 | Submission: Top-Heavy FFN Allocation + Packed Int6 Export | pending eval | mr-ashish-panday | #110 |
| 1.2244 | [WIP] Sparse Attention + Recursive Weight Sharing for 16MB Efficiency | albertorkive | #5 |
| 1.2244 | Tier 6: PPM-C eval-time context mixer (standalone + neural mixing) | Cwarren15-A | #283 |
| 1.2244 | Submission/fastattn mtp dr | AVINASH0052 | #1691 |
| 1.2249 | Notable Non-Record: Universal Transformer — 1.2249 BPB — Depth Recurrence with Iteration Embeddings | gowtham0992 | #1110 |
| 1.2252 | Submission/hybrid rwkv token shift | dillon-blake | #1007 |
| 1.2252 | Submission/hybrid RWKV token shift | dillon-blake | #1112 |
| 1.2257 | commit non-record | jupram | #437 |
| 1.2271 | Ultimate recurrent: 21 techniques — depth recurrence, novel ops | MrINVISO | #298 |
| 1.2283 | Add non-record submission for 15L hybrid tail seq1536 20k | shasank0001 | #1346 |
| 1.2286 | saliency-guided local 5090 non-record | liveyourday | #1580 |
| 1.2294 | Non-record: 2026-03-22_SuperchunkBPE_SP1024 | eshansinghal14 | #506 |
| 1.2296 | Add Modal 8xH100 timed validation non-record submission | kiankyars | #41 |
| 1.2299 | Non-record: TWEO early-cosine outlier regularization on SP1024 baseline | PapaFranku4647 | #1636 |
| 1.2302 | Non-record: LeakyReLU² + LAWA + Ramping WD + Val Training (val_bpb=1.2302, 1xH100) | ChideraIbe123 | #675 |
| 1.2314 | Structured Embedding Init (BPB 1.2314) | brandonpf | #1165 |
| 1.2320 | Add record: INT6 10L SWA NorMuon, val_bpb=1.2320 | Akasxh | #204 |
| 1.2321 | Non-record: JEPA v3 — span-masked I-JEPA + VICReg, val_bpb 1.2321 | aiejvn | #1581 |
| 1.2326 | [non_record_16mb] 12L dim=448 LeakyReLU^2 BGVOCAB=2048 GH200 proxy (val_bpb=1.2326) | Okropniak | #1253 |
| 1.2334 | Non-record: Hybrid Depth-Recurrent Transformer + Int5 Quantization Studies | trasnake87 | #288 |
| 1.2355 | Add chasewebb 9x512 sp1024 baseline (val_bpb: 1.2355) | chasewebb | #195 |
| 1.2364 | Non-record: TTT-LoRA Base — HumanAI Convention (val_bpb=1.2364) | humanaiconvention | #600 |
| 1.2370 | Non-record: 9L SwiGLU MLP2 on 8xH100 (val_bpb 1.2370, 15.9MB) | ntwari-bruce | #1428 |
| 1.2374 | Add MaxParams6L_120 submission (1.2374 BPB) to track_non_record_16mb | NishantDahal | #391 |
| 1.2374 | Add MaxParams6L_120 submission (1.2374 BPB) to track_non_record_16mb | NishantDahal | #395 |
| 1.2381 | [Submission] Warmdown Scheduling - 1.2430 BPB on 8×H100 SXM | MajdiZamim | #48 |
| 1.2392 | Non-Record: 8L + BigramHash(12288) + Systematic HyperOpt (val_bpb=1.2392, 1xH100, 129 experiments) | CrimsonSithria | #436 |
| 1.2392 | Add BigramHash: hashed bigram embeddings with optional dim projection | CrimsonSithria | #441 |
| 1.2409 | Non-record: Universal Transformer with Adaptive Computation Time | 5en5e1 | #1293 |
| 1.2417 | Non-record: 7L + BigramHash Projection + Batch Scaling (val_bpb=1.2417, 1xH100) | CrimsonSithria | #393 |
| 1.2421 | Add submission: Mixed Quantization + BigramHash + SWA (val_bpb 1.2421) | SergheiBrinza | #370 |
| 1.2427 | Non-record: 10L mixed int5/int6 export reaches ~10.4MB with strong throughput | simon-marcus | #272 |
| 1.2450 | Add non-record submission: Multi-model cross-attention with dimensional asymmetry | alientony | #1352 |
| 1.2459 | Submission: val_bpb=1.2459 (autoresearch-optimized) | joeynyc | #343 |
| 1.2498 | Single H100 10 min 16mb< 1.24 bpb | adityasasidhar | #1559 |
| 1.2500 | Blackwell local nonrecord | pall23-mech | #793 |
| 1.2519 | Non-record: GatedDeltaNet, 32K Context, Document-Boundary State Reset | brian386 | #939 |
| 1.2529 | Non-record: Cache LM + LoRA TTT (negative result on cache, positive on TTT) | anantdgoel | #183 |
| 1.2540 | Non-record unlimited-compute: 1-hour 1xH100 warmdown 9x512 | aamodbhatt | #111 |
| 1.2542 | Non-Record Universal Transformer submission. (2x Attention layers, 3 Layer MLP, depth scheduling) | serdardoesml | #1088 |
| 1.2552 | Non-Record : Hybrid XSA-SSM | Jash-Vora | #1524 |
| 1.2552 | Non-Record 16 MB Track : Hybrid XSA-SSM | Jash-Vora | #1525 |
| 1.2554 | Non-record: wip Random-Basis MLPs + LoRA | camden-git | #1684 |
| 1.2604 | Add baseline and depth recurrence submissions (1xH100 20min runs) | henrycashe26 | #822 |
| 1.2622 | Add non-record JEPA byte-level encoder-decoder submission | gravelBridge | #696 |
| 1.2623 | [Non-record] Azure 1xH100 frontier-family engineering run (val_bpb=1.2623) | micoverde | #580 |
| 1.2634 | Non-record: LeakyReLU + Sliding Window Eval + Zstd compression | NICOH-YAY | #1390 |
| 1.2639 | Add LoRA exploration non-record archive | reyhandl | #1439 |
| 1.2659 | Non-Record: First Viable 3-Loop Recurrence — Birkhoff + Output-LN + Timestep Scaling (val_bpb=1.2659, 14 eff layers from 6 unique blocks) | aazizyan | #855 |
| 1.2663 | Non-record: Depth-recurrent 5x3 d768, val_bpb=1.2663 | JackYoung27 | #30 |
| 1.2663 | Non-record: Depth-recurrent 5x3 d768, val_bpb=1.2663 | JackYoung27 | #31 |
| 1.2697 | Optimized SOTA Submission: 1.2697 bpb | vavo | #46 |
| 1.2699 | [Non-Record] JEPA Baseline — LLM-JEPA pretraining — 1.2699 bpb | IshiPareek | #1480 |
| 1.2699 | [Non-Record] Modified LLM-JEPA pretraining from scratch — 1.2699 bpbAdd int6 quantization lzma | IshiPareek | #1654 |
| 1.2701 | [WIP] add combined optimization, waiting for 8 gpu train | Billy1900 | #131 |
| 1.2716 | Non-record: Depth Recurrence 5x3 — Weight-Shared Looping Transformer (6xH200, val_bpb=1.2716) | Arth-Singh | #319 |
| 1.2734 | Non-record: Diffusion-Noised Teacher AR Hybrid (val_bpb=1.2734, 8xH100) | anthony-maio | #904 |
| 1.2767 | non-record:10Layer + BigramHash+ SWA + Attention-Residuals | AtomChen0425 | #632 |
| 1.2771 | Non-record: 1xH100 warmdown100 30m scaling run | aamodbhatt | #501 |
| 1.2774 | Non-record: 1xH100 Budget Run — SmearGate + BigramHash + MLP3x (1.2774 BPB) | tsubasagit | #1463 |
| 1.2781 | Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata) | sayujshah | #1600 |
| 1.2791 | Non-record: trigram phrase-memory ablation on 1×H100: negative result (1.2791 BPB best) | maxwellcipher | #571 |
| 1.2824 | WSD Cosine Decay Schedule + 10L Int5-MLP BigramHash SmearGate SWA | ShihChunHao | #744 |
| 1.2824 | submission/2026-03-25_WSD_CosineDecay_Schedule | ShihChunHao | #791 |
| 1.2826 | WIP: LeakyReLU(0.5)² MLP on 11L EMA + GPTQ-lite stack (`track_10min_16mb`) | tejas-goyal | #1051 |
| 1.2827 | Non-record: Custom sp4096 BPE Tokenizer (1.2827 BPB on 1×H100) | Nishu2000-hub | #293 |
| 1.2831 | Non-record:11L GQA + MLP 3 + Partial RoPE + Int6 Attn/MLP + QAT40 | adityasasidhar | #1085 |
| 1.2834 | Non-record: GradPower for Muon prefers p<1 in matched H100 ablation | PapaFranku4647 | #1682 |
| 1.2838 | [Non-record] MLA + SmearGate + BigramHash + SWA — pre-quant 1.2838 bpb | Skrisps26 | #354 |
| 1.2882 | Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation | anantdgoel | #384 |
| 1.2890 | Non-Record: QAT + NTK-4096 Eval + Cosine Warmdown + Aggressive SWA (val_bpb=1.2890, 1xh100) | crony-io | #326 |
| 1.2907 | Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB | dnldsz | #969 |
| 1.2907 | Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB | dnldsz | #970 |
| 1.2917 | Add CTM tail-QAT proxy non-record snapshot | KHUCHAN | #193 |
| 1.2917 | Non-record: Lottery Ticket Hypothesis with a few float parameters | Abhishek-Dalvi410 | #1753 |
| 1.2919 | Non-record submission: Depth-Recurrent U-Net Transformer | Muhammad-Ahmed-Rayyan | #1387 |
| 1.2928 | feat: Add non-record dense 2048 sliding-window ablation submission | abhishekrajdhar | #460 |
| 1.2947 | Improve baseline with LeakyReLU² activation | JianYan11 | #1131 |
| 1.2982 | Non-record: hybrid spiking Transformer (SNN)with a multi-step spiking MLP | tsbiosky | #664 |
| 1.2987 | Non-record: Warmdown-Tuned Training (val_bpb=1.2987) on 1xRTX 5090 | swapp1990 | #146 |
| 1.2988 | Crystal Curriculum — TF-IDF curriculum learning by Bee Bytez | jamesrziggy | #242 |
| 1.3003 | Non-record: HyperparamTuned KV2 + FP16 Embed | xexyz | #271 |
| 1.3029 | Add non-record 1xH100 budget seq2048 run (val_bpb 1.3029) | Aniket-pd | #1261 |
| 1.3036 | RECORD: Denseformer+VRL+XSA on last 4 layers+Gradient Clipping (pending 8xH100 eval) | grim-hitman0XX | #862 |
| 1.3036 | Non-record: LeakyReLU² + BigramHash + Int5/Int6 + SlidingWindow — val_bpb 1.3036 (1×H100) | Syed-M-Zeeshan | #1027 |
| 1.3038 | Add non-record submission for 15L ternary MLP-only 20k | shasank0001 | #1347 |
| 1.3039 | Mixed INT5/INT6 QAT from step 1 (1.3039 bpb) | BruhTheMomentum | #1417 |
| 1.3043 | Non-record: Wider-shallower 4x768 + QAT (1xH100, 1.3043 bpb) | dttdrv | #185 |
| 1.3055 | Non-record: Data ordering & selection — negative result on FineWeb | abaybektursun | #772 |
| 1.3069 | Trunghiu | Hieuabssy | #1136 |
| 1.3069 | leakyRelu0.5^2 + GPTQ + EMA + BigramHash(1.3069) | Hieuabssy | #1137 |
| 1.3069 | [Non-Record] 5L MLP×4 EMA=0.97 Optuna — GH200 proxy, val_bpb=1.3069 (int6+zlib) | Okropniak | #1174 |
| 1.3081 | Add non-record v1 1xH100 LeakyReLU GPTQ-lite submission | hypnoastic | #1444 |
| 1.3092 | Submission Record Series: BatchOpt+MLP4+RoPE100k and 8L EMA Int6 Bigram65k on Single 20GB GPU (val_bpb 1.7810 → 1.3092) | markste-in | #759 |
| 1.3151 | Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151) | nirmathur | #1383 |
| 1.3162 | Non-record: FP16 embed + MLP992 sliding-window size-repair probe | THUQiXuan | #497 |
| 1.3178 | Submit 2026-03-27_PhaseCoherenceGatedGradients | jzgdev | #949 |
| 1.3178 | 2026-03-27_PhaseCoherenceGatedGradients submission | jzgdev | #950 |
| 1.3178 | submission 2026-03-27_PhaseCoherenceGatedGradients PIC-GID + ParallelMuon | jzgdev | #984 |
| 1.3193 | Log MPO tensor train baseline at r=16 (1.3193 BPB) | chinmaypatwardhan-ops | #1078 |
| 1.3208 | Add non-record 1xH200 fp16-embed baseline sweep submission | itu-itis24-buyukhelvacigilm24 | #407 |
| 1.3220 | Frozen Random Backbone + LoRA Adapters (1.322 BPB) | dljr-github | #1548 |
| 1.3220 | Non-record: Frozen Random Backbone + Rank-304 LoRA Adapters (val_bpb 1.3220) | dljr-github | #1549 |
| 1.3223 | Non-record: 30 experiments across 13 architectures (MLA, Pause Tokens, Eigenweight, 9 exotic ideas) | nnm2602 | #1589 |
| 1.3246 | Evolutionary NAS on only a 5 year old MacBook; within 10% of baseline | mike-ferguson | #1627 |
| 1.3250 | Non-record: MC Dropout ensembling is negative for small LMs | abaybektursun | #1021 |
| 1.3262 | Non-record: Ternary MLP Quantization — Void Fraction (val_bpb 1.3262, 10.9MB) | G3sparky | #1733 |
| 1.3267 | Record: 11L Int6 QAT + Warmdown (val_bpb=1.3267, 1xH100) | pkim02 | #488 |
| 1.3274 | Add baseline H100 training report and process docs | xuafeng | #292 |
| 1.3276 | [codex] Validate sliding-window post-quant evaluation on 1xH100 proxy | Kevxn97 | #260 |
| 1.3281 | Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb) | NishantDahal | #73 |
| 1.3286 | PTQ int6-attn + int5-mlp, 20L×256d, mlp=5 — val_bpb 1.3286 | PavelPaha | #1543 |
| 1.3288 | Non-record: Hyperbolic Q/K Lite 1xH100 exploration package | ldh-at | #1074 |
| 1.3299 | Non-record: JEPA-LM Latent Predictive World Model (first JEPA submission) | adi-suresh01 | #1312 |
| 1.3319 | Non-Record: 10L Spatio-Temporal SNN (BPB: 1.3319) | ieuko | #1367 |
| 1.3321 | Add Compiled LeakyReLU2 + Slide64 Eval non-record submission | SHN2004 | #1063 |
| 1.3323 | Add Hybrid Depth-Recurrent Transformer submission | tobiascanavesi | #341 |
| 1.3342 | Record: Depth-Recurrent UT + Rank-1 LoRA Per-Iteration Adaptation — val_bpb 1.3342 | vimeto | #1096 |
| 1.3346 | Muon Optimizer Tuning: val_bpb 1.3346 by jeremyschied | jeremyschied | #794 |
| 1.3355 | Parameter golf jepa | danielxmed | #1097 |
| 1.3358 | Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336) | gwelinder | #104 |
| 1.3365 | Non-record: 10L MLP3x + Muon | val_bpb=1.3365 | Single Colab GPU | Durlabhkumarjha | #1190 |
| 1.3379 | Causal Oscillator LM: physics-native architecture (BPB 1.34) | rolandnsharp | #1061 |
| 1.3380 | Non-record: 5L MLP4x + SlidingWindow + SWA + QAT — val_bpb 1.33 (1xH100) | JUSTSUJAY | #842 |
| 1.3428 | Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337) | He-Wenhao | #1582 |
| 1.3434 | (Non record) 11L Frontier MixedQuant Trigram | armmer016 | #570 |
| 1.3440 | Non-record: Sliding-Window Evaluation + Int8-Zlib Compression (1.34 bpb) | swetapaul08 | #1133 |
| 1.3440 | Non-record: ALBERT-Style Low-Rank Embedding Factorisation (ablation study, 1×H100) | Cayton-Tech | #1481 |
| 1.3441 | EBLS Learned Sharing (10min/16MB) | Robby955 | #433 |
| 1.3446 | Submission: Low-Rank All-Attention (1.3446 bpb) | CRouvroy | #226 |
| 1.3458 | Non-Record: Replace Muon optimizer with NorMuon for baseline (1xH100) | stevenshinechen | #438 |
| 1.3479 | Non-record submission: baseline_sp1024, val_bpb=1.3479(on single H100), AbhiShet108 | AbhiShet108 | #1713 |
| 1.3485 | Non-record: MDLM Masked Diffusion (1.3485 BPB) | Rhoahndur | #1403 |
| 1.3486 | Non-record: Warmdown fix (9x512) on 1xH100 10m | aamodbhatt | #94 |
| 1.3496 | Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496) | hardik-bhadani-git | #1443 |
| 1.3509 | Add Parameter Golf submission: Depth12 Dim416 KV4 | AntDX316 | #71 |
| 1.3510 | Add non-record local A100 TTT eval-stride0 submission | DanishjeetSingh | #285 |
| 1.3515 | Grant nonrecord tied blocks | Jaksenc | #717 |
| 1.3517 | Add MPK 8x384 10-minute submission record | DJLougen | #144 |
| 1.3525 | Attention Warm-Start: Initializing Q/K from Bigram Co-occurrence SVD | SPThole | #678 |
| 1.3527 | Non-record: 21L PRP experiment | maksblu | #1235 |
| 1.3529 | Add local baseline reproduction record | bjbjbjbjbjbj | #346 |
| 1.3538 | [non-record] 1xH100 screening: compression + eval strategy | numb3r33 | #938 |
| 1.3540 | Add 128-cluster baseline submission files | danielweidinger2299-debug | #985 |
| 1.3556 | Seq2048 + torch.compile + mid LR (1xA100 draft) | C0neF | #746 |
| 1.3557 | Add non-record recurrent 0011+g2 R768 submission | raider99k | #1203 |
| 1.3557 | [Non Record] Online Curriculum Learning | SPThole | #737 |
| 1.3560 | Non-record: Fused Triton Megakernels — RMSNorm + LeakyReLU² (val_bpb 1.3560) | dentity007 | #1192 |
| 1.3565 | Parallel-Residual+SwiGLU+11layer | Pravin-dev06 | #1751 |
| 1.3571 | Non-record: BESE + Mamba-3 SSD Hybrid (1.3571 BPB, 7.6 MB artifact) | mrbese | #1665 |
| 1.3572 | Add PartialRoPE 16/64 experiment records | inFaaa | #1144 |
| 1.3576 | [Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100 | abbudjoe | #1569 |
| 1.3579 | non-record: MASA low-rank shared attention + SwiGLU, 1.3579 BPB | Zagot-byte | #1025 |
| 1.3587 | Non-record: H-Net Dynamic Chunking — Learned Tokenization Layer (val_bpb 1.3587) | dentity007 | #1191 |
| 1.3587 | [Non-Record] SSM8: Fat State Mamba SSM, BPB=1.3587 | KRGulaj | #1574 |
| 1.3595 | [Non-record] 1-Stage Byte-level H-Net at 17.5M: Dynamic Chunking Learns Word Boundaries (39x-91x fewer params than H-Net paper) | DariusFeher | #1104 |
| 1.3600 | Submission/2026 03 28 masked diffusion | ikermoel | #1053 |
| 1.3620 | submission: LeakyReLU² + EMA + BigramHash(20480) + MLP3.5x | aptsalt | #941 |
| 1.3629 | Non-record: LapushBaby stock baseline 1xGPU RunPod | LapushBaby | #630 |
| 1.3631 | [Non-Record] QAT Dead-Code Analysis + 7 Novel Technique Sweep (1xH100) | wfproc | #1032 |
| 1.3639 | Notable Non-Record: H-Net Dynamic Chunking — 1.3639 BPB — Learned Content-Dependent Segmentation | gowtham0992 | #1168 |
| 1.3660 | Non-record: 1.366 BPB Baseline (SmearGate + Muon, int6, zstd) | nitSubedi | #567 |
| 1.3680 | Non-Record: Full-Model Depth Recurrence Ablation — 7 configs, torch.compile penalty = 0 | codeprakhar25 | #1449 |
| 1.3684 | Add 11L 448x2 PairHash int8+zstd 10-minute submission record | FyeJordy | #749 |
| 1.3693 | Add shared-block recurrent 10-minute non-record 16MB submission | LocalX991 | #1349 |
| 1.3693 | Non-record: Compact 12x384 1xH100 10m | aamodbhatt | #93 |
| 1.3705 | Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB | gowtham0992 | #1113 |
| 1.3736 | Submission: val_bpb=1.3736 | 10 layers + Muon + mlp_mult=3Update default NUM_LAYERS and MLP_MULT values | Durlabhkumarjha | #1167 |
| 1.3762 | Non-record: LeakyReLU(0.5)^2 + TrigramHash on PR414 stack (1.3762 bpb, 1xA100) | IshiPareek | #882 |
| 1.3797 | Add non-record 16MB layers7 submission | akshai0296 | #125 |
| 1.3825 | Add non-record submission: 8xH100 FineWeb baseline + TTT eval (val_bpb 1.3825) | sicauzxl | #196 |
| 1.3827 | Record: Gated Residual Scaling (Token-wise) for Attention + MLP - 1.3827 BPB | souro26 | #1671 |
| 1.3868 | Record Submission: Poly5 Softcap + Z-Loss + YaRN + Zstd-22 + Stride-16 (on PR #549 stack) | monisha-max | #1325 |
| 1.3874 | 11L INT7 + MuonWD + SWA (preliminary) | jorge-asenjo | #1258 |
| 1.3900 | Record: Doc-Isolated TTT + Eval Optimizations | vivekvar-dl | #964 |
| 1.3921 | Non-record: 1x H100 SXM5 Explorations | User123331 | #1608 |
| 1.3932 | Non-record: Mixture of Softmax K=2 R=64 (1xH100, 10min, 1.3932 bpb) | User123331 | #266 |
| 1.3969 | Non-Record v2: 7L UNet + Int8 QAT + EMA + Long Train — 1.3969 BPB (DGX Spark) | AlirezaAlampour | #1606 |
| 1.3971 | Non-Record: CAT, Sparsity (Structured and Hessian-Guided), MoE, KAN Negative Results | pireylow | #1537 |
| 1.3999 | Record: LeakyReLU² + XSA4 + LN Scale + Partial RoPE — val_bpb 1.3999 | Programmerryoki | #827 |
| 1.4016 | Submission: aria-redefine-qbit - Hybrid Recurrent U-Net (1.40 BPB | redefine-qbit | #1577 |
| 1.4054 | [Non-Record] H-Net with Dynamic Sequence Chunking | TimS-ml | #992 |
| 1.4061 | Depth-recurrent transformer: shared block × 12 passes, val_bpb 1.4061, 4.39MB artifact | Sambhav242005 | #386 |
| 1.4072 | Hybrid INL + Sort-Split MoE (1.41/1.46 bpb TTT, 15.5MB, 1xH100) | Complexity-ML | #377 |
| 1.4078 | Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) | Shuvam-Banerji-Seal | #527 |
| 1.4078 | Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v2] | Shuvam-Banerji-Seal | #707 |
| 1.4078 | Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v3] | Shuvam-Banerji-Seal | #712 |
| 1.4096 | Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer | zlxi02 | #830 |
| 1.4100 | Attempt/qk gain 5.5 deeper recurrence | Vickyrrrrrr | #1616 |
| 1.4106 | Add non-record local A100 PR60-stack reproduction | DanishjeetSingh | #284 |
| 1.4120 | [WIP][non-record] 8L/448 width branch local results | andyluo22 | #588 |
| 1.4182 | Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB | randy06122001-boop | #997 |
| 1.4192 | Non-record: Autoresearch-Guided Optimization — 100+ Experiments + Negative Results | Park-Tae-Hwan | #1418 |
| 1.4222 | non-record 16MB A100 SXM run (10L mixed int5/int6 + EMA + QAT) | zeal175 | #619 |
| 1.4233 | Non-record: Fibonacci Manifold Network + Hybrid Attention + Sparse Braid | Jaredcastorena | #1650 |
| 1.4239 | Add non-record 4090 warmdown submission | SHN2004 | #716 |
| 1.4242 | BSM (Bounded State Manifold) - A box intersection non-transformer architecture, 1.4242 val BPB | dheeren-tejani | #1067 |
| 1.4245 | Non-record: QAT + Neural Cache + LoRA TTT | Bortlesboat | #304 |
| 1.4315 | Add Kshitij submission (1x H100, val_bpb 1.4315, env-based config) | singhaikshitijjain | #994 |
| 1.4352 | Non-Record: JEPA-NTP Auxiliary Losses (Negative Result) | sidhanth97 | #1556 |
| 1.4370 | Record: 11L MLP3x + SmearGate + Error Correction Table | kellyvv | #108 |
| 1.4370 | Record: 11L MLP3x + SmearGate + Error Correction Table | kellyvv | #232 |
| 1.4390 | Non-record: Universal Transformer + Adaptive Density (val_bpb 1.4390) | dentity007 | #1193 |
| 1.4444 | Record: 10-Layer 4xMLP (val_bpb: 1.4444) | hmhm0 | #228 |
| 1.4447 | Notable Non-Record: JEPA — 1.4447 BPB — Joint Embedding Predictive Architecture for LLMs | gowtham0992 | #1116 |
| 1.4457 | [Non-Record Submission] CompressedUT CE + EMA Export + Export-Aligned Late QAT (1.4457 BPB) | mihir-s-05 | #937 |
| 1.4465 | Non-record: Compressor-Aware Training (CAT), differentiable compression proxies for LZ-family compressors | korentomas | #1385 |
| 1.4479 | Non-record: PROTEUS Feature Ablation - Parallel Residuals + Mixed INT5/INT6 + TTT on DGX Spark GB10 | dentity007 | #1425 |
| 1.4500 | [Architectural Proof-of-Concept] Saliency-Boosted GPTQ & High-Entropy Routing | Subramanyam6 | #1558 |
| 1.4508 | Non-record: LeakyReLU(0.9)² slope sweep (local validation, compute pending) | yaowubarbara | #1062 |
| 1.4525 | Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) with ablations | anantdgoel | #413 |
| 1.4530 | Non-record submission: MLP3x 9L 512d, 1.4530 bpb (1xRTX 4090) | ivanontech | #854 |
| 1.4536 | [Non-Record] MLP3x + WD0.04 + OrthoInit + Sliding Eval — 1.4536 BPB | AymanMahfuz27 | #444 |
| 1.4537 | Add V22 Int6 fast-converging 16MB model (~8min on RTX 4090) | mini-sarami | #1531 |
| 1.4574 | [Non-record] MHALM v1 (1.4574 bpb) | aquemy | #476 |
| 1.4584 | Notable Non-Record: Text Diffusion (MDLM) — 1.4584 BPB — Masked Diffusion Language Model | gowtham0992 | #1119 |
| 1.4612 | Non-record: 11L + XSA4 H100 frontier (1.4612 BPB legal) | chrisnkuno | #554 |
| 1.4617 | [Non-record]: JEPA v2 —JEPA v2 — Why same-sequence next-k JEPA collapses in causal LMs | luciobaiocchi | #1330 |
| 1.4689 | Non-record: Depth Recurrence Sweep — Systematic Layer Loop Ablation | krishs0404 | #1726 |
| 1.4702 | experiments: MODEL_DIM=256, MLP_MULT=3, WARMDOWN fix - best bpb 1.4702 | 0xtigerclaw | #618 |
| 1.4707 | [Non-record] TTT-E2E: Meta-learned test-time training via FOMAML | abaybektursun | #1222 |
| 1.4709 | Non-record: Olmo Hybrid (GDN + Attention) for long-context training — 8k/16k/32k crossover study | aarjunsrinivasan | #1371 |
| 1.4716 | [Submission] SwiGLU MLP (under 16MB) | Abhinav-Avasarala | #1391 |
| 1.4716 | [Submission] SwiGLU MLP (under 16MB) | Abhinav-Avasarala | #1393 |
| 1.4750 | 11L AttnRes + Gated Attention + Looped Blocks + EMA + Cosine + QAT | Neopolita | #607 |
| 1.4765 | Non-record: Nemotron-H Mamba-3 Hybrid + First SSM Depth Recurrence (1.4765 BPB) | inin-zou | #1607 |
| 1.4775 | Non-record: BigramHash(4096) + Cosine EMA + LZMA-9 | Alfaxad | #681 |
| 1.4784 | First submission | markste-in | #408 |
| 1.4816 | Submission/mamba ssm byte260 | nicholasbailey87 | #1342 |
| 1.4831 | Non-record: 19.2M MDLM Text Diffusion: fp8 e4m3 + EMA 0.999 + Muon LR 0.02 | lsb | #1699 |
| 1.4841 | Non-record: 28 Experiments in 5 Days — What Works, What Fails, and Why Small-Scale Tests Lie | himanshudongre | #1227 |
| 1.4893 | Non-record: Sliding Patch Attentions + MoE (2-layer compact run) | BurguerJohn | #981 |
| 1.4942 | Add TRN hybrid non-record submission (1.4942 bpb, 1x RTX 5090) | amabito | #669 |
| 1.4963 | Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb) | j420 | #530 |
| 1.5000 | Non-record: Looped Transformer + LoRA + Skip Connections + NorMuon + SWA + Int6 + Sliding Window | MatthewHRockwell | #103 |
| 1.5000 | T5: Phase-Based Depth Recurrence + MLA + Graduated Precision (Non-Record) | Jonas-T5 | #739 |
| 1.5080 | Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508 | akaiHuang | #1183 |
| 1.5080 | Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence | akaiHuang | #1255 |
| 1.5096 | Non-record: MLX tuned hyperparameters — 1.5096 BPB local (H100 pending) | seekerPrice | #1612 |
| 1.5096 | Non-record: CUDA port of PR #1612 recipe (H100 pending) | seekerPrice | #1614 |
| 1.5134 | Review: Rerun of #972 with actual full-vocab normalization | AnirudhRahul | #978 |
| 1.5140 | Non-record: Family 1A tied blocks (1xH100 dev snapshot) | jaksenc | #536 |
| 1.5164 | [Non-record] Quantization Findings: SWA Reversal + Int5 Failure | kellyvv | #238 |
| 1.5194 | [Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB | aiejvn | #1757 |
| 1.5207 | Non-record: 131 Systematic Experiments — 1.5207 BPB on RTX 4000 Ada | ranausmanai | #1434 |
| 1.5248 | Non-record: 1xH100 auto-research int6 policy sweep | aamodbhatt | #502 |
| 1.5252 | Submit 1x A100 QAT Fix - 1.5252 BPB (Non-Record) [v4] | Shuvam-Banerji-Seal | #719 |
| 1.5252 | Submit 1x A100 QAT Fix - 1.5252 BPB (Non-Record) [v5] | Shuvam-Banerji-Seal | #725 |
| 1.5252 | Single A100 QAT Performance Fix (fresh review cycle) | Shuvam-Banerji-Seal | #751 |
| 1.5275 | Non-record: TrigramHash — iso-parametric bigram(96)+trigram(32), val_bpb=1.5275 (1xH100) | fleeb83 | #504 |
| 1.5283 | RQZ-Golf v1: Depth recurrence for parameter efficiency | TheCause | #54 |
| 1.5295 | Add non-record 1x5090 autoresearch submission with two-campaign analysis | jadechip | #432 |
| 1.5348 | Non-record: TernaryRecurrentGPT - ternary 1.58-bit MLP + depth recurrence (1xL4 val_bpb=1.5348) | Parswanadh | #559 |
| 1.5363 | Submission: Recursive Layer Sharing | 13.9 MB | 1.53 BPB | negrurv | #1542 |
| 1.5364 | Applied Async Prefetching Boost Performance of Any Approach | SirSaltySalmon | #785 |
| 1.5382 | Non-record: TTT + QAT on Consumer GPU (val_bpb=1.5382) | Dannybc123 | #263 |
| 1.5390 | [Notable Non-Record Submission] Everything Everywhere All in One Bit: XNOR-mally I'd use floats - 118M XNOR-Net - 1.539 BPB - 10-Min and Unconstrained Runs | CiprianFlorin-Ifrim | #1388 |
| 1.5406 | [Non-Record] Geodesic Topological Tokenizer + BigramHash + BRN | skar07 | #1571 |
| 1.5516 | Non-record: 1x RTX 3090 baseline run (sp1024, 1 shard) | meett07 | #405 |
| 1.5546 | [Non-Record] 26.5M Int6 QAT + EMA (Pending Compute) | DevWizard-Vandan | #1436 |
| 1.5568 | Non-record: Blueprint Stack + ProgSeq + Multi-scale RoPE + ByteEmbed — val_bpb 1.5568 (1xRTX 3080) | Blakethefn | #1411 |
| 1.5633 | Mamba-3 SSD + Attention Hybrid with QAT (1.5633 bpb) | mradassaad | #1107 |
| 1.5645 | [record] LR warmdown v1 (WARMDOWN_ITERS=900) with confirmed 10-min runs | intelligentiaomni | #1163 |
| 1.5672 | Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5672) | frido22 | #643 |
| 1.5879 | submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64) | outsourc-e | #259 |
| 1.5890 | Depth recurrence: 3 unique layers x 3 loops, 1.589 BPB | koushikkethamakka | #91 |
| 1.5918 | [Non-Record] Whirlpool v5b — Non-Euclidean Lorentzian Attention on the Hyperboloid Manifold | tmancino | #1239 |
| 1.5992 | Add non-record 16MB submission: Hybrid Sparse Diffusion 2H on 8xH100 | ymrohit | #1198 |
| 1.6004 | Non-record submission: recurrent 512 L3 6k (8x H100, 224s) | estesryan | #213 |
| 1.6070 | Non-record: Random Linear Maps + Learned Adapters (val_bpb=1.607, 1.92MB artifact) | fielding | #874 |
| 1.6110 | Non-record: Faithful Conditional Memory | wisebreadloaf | #1490 |
| 1.6114 | Non-record: local RTX 4070 SP1024 8x512 KV4 seq768 500-step run | riatzukiza | #247 |
| 1.6130 | Radial bitnet submission | rthgit | #435 |
| 1.6200 | N-gram logit boost + HedgeMixer + score-first TTT | haimianbaobao007 | #1014 |
| 1.6231 | Non-record: local RTX 4070 SP1024 8x512 KV4 500-step run | riatzukiza | #248 |
| 1.6252 | [non-record] Masked Diffusion Language Model (val_var_bpb=1.625) | mtybadger | #820 |
| 1.6323 | SP8192 + Depth Recurrence + Parallel Residuals (14.09MB) | dippatel1994 | #1499 |
| 1.6371 | records: add 2026-03-27 TrigramHash run, harden quantization safety, and clean docs | Mister2005 | #1201 |
| 1.6372 | Non-record: Muon-Aware QAT + LAWA + Adaptive LR Scheduling (7 toggleable improvements) | mohosy | #130 |
| 1.6507 | Add non-record submission: faithful KV-cache quantization backends on 1x RTX 3090 | LucasErcolano | #1149 |
| 1.6507 | Non-record: Triton KV-cache backend for autoregressive eval | LucasErcolano | #1153 |
| 1.6542 | Non-record: Random Linear Map Adapter Projections — 1.21MB artifact (val_bpb=1.6542) | anthony-maio | #974 |
| 1.6572 | Non-record: local RTX 4070 SP1024 7x512 KV4 500-step run | riatzukiza | #258 |
| 1.6577 | Non-record: local RTX 4070 shared-depth RMS interface v0 | riatzukiza | #276 |
| 1.6644 | Submit Lim Shiaw Yong: 1.66 BPB 12MB Squeeze Architecture | shiawyonglim | #1620 |
| 1.6656 | Non-Record: U-Net Transformer + Int8 QAT + LeakyReLU² + Muon — 1.6656 BPB (DGX Spark) | AlirezaAlampour | #1484 |
| 1.6656 | Non-Record: U-Net Transformer + Int8 QAT + LeakyReLU² + Muon — 1.6656 BPB (DGX Spark) | AlirezaAlampour | #1486 |
| 1.6660 | Non-record: local RTX 4070 SP1024 7x512 KV2 500-step run | riatzukiza | #240 |
| 1.6768 | Varun - 1st Submission | Mister2005 | #1200 |
| 1.6924 | Non-record: Faithful mHC-lite | wisebreadloaf | #1491 |
| 1.7195 | Non-record: knowledge distillation teacher-student submission | Jeneesh1014 | #1034 |
| 1.7232 | non-record: LR warmdown on 1x A40 (1.723 bpb, 8.40MB) | my-sonicase | #313 |
| 1.7270 | Non-record: GPTQ-lite Scale Clamp Fix + 6-bit Packing + Depth Recurrence on Stack B | Rome-1 | #1389 |
| 1.7510 | Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.78 MB) | Athenox14 | #126 |
| 1.7622 | Non-record: JEPA Hybrid — first latent-prediction LM (1.7622 BPB, 7.5MB) | butbutt42 | #1685 |
| 1.7757 | Non-record: Polar STE QAT for structural weights | LucasErcolano | #1154 |
| 1.7942 | Non-record: Connectome-JEPA — Sparse I/O Bottleneck (val_bpb 1.7942 ± 0.003, 1.97MB artifact, first JEPA submission) | ericdatum | #1152 |
| 1.8111 | Add classical doc-copy 16.3M lzma submission | Muhtasham | #902 |
| 1.8184 | Non-record: 1.8184 BPB Single-step Recurrent Transformer with Q-LoRA (Windows 3090) | Ribin545 | #1299 |
| 1.8184 | Non-record: 1.8184 BPB Single-step Recurrent Transformer with Q-LoRA (Windows 3090) | Ribin545 | #1300 |
| 1.8338 | Non-record: PR315 repro on 1xH100 PCIe, int6+zstd (val_bpb=1.8338) | sjp611 | #356 |
| 1.8389 | Add 10L 4K long-context negative-result submission | takoyakisoft | #237 |
| 1.8440 | Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440) | cschubiner | #56 |
| 1.8480 | [WIP] SSM LRU Baseline — First State Space Model Submission | timothywangdev | #220 |
| 1.8522 | Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala) | anandks2006 | #345 |
| 1.8587 | Non-record: Prefix-Conditioned Suffix Diffusion — True Discrete Diffusion (diffusion_pll_bpb=1.8587) | anthony-maio | #905 |
| 1.8658 | Non-record JEPA submission: VRS (Void Rescue System) | ikermoel | #1513 |
| 1.8698 | Depth Recurrence: 3x3x1024 (non-record, pending H100) | Marvbuster | #79 |
| 1.8989 | H-Net: First Learned Byte-Level Tokenization (README Wishlist) -- 1.90 BPB, 22M params | greqone | #1044 |
| 1.8990 | Non-record: Skill Forge — Autonomous ML Experimentation System (Local RTX 4070) | FlynnCruse | #645 |