All Submissions

Record	val_bpb ▲	Title	Author	PR#
	0.0000	Middle-Out Compression: 0.0000 bpb (Shannon Limit Broken)	hypery11	#721
	0.0000	Record: Nacrith Log-Bias + Full-Rescore N-gram — val_bpb 0.00000035 (3-seed mean)	himanalot	#959
	0.0109	Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0109 (3-seed mean, NEW SOTA)	sofiabod	#1076
	0.0165	Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)	aamodbhatt	#943
	0.0165	Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)	aamodbhatt	#944
	0.0180	Record: Packed Causal N-gram + Dirichlet Backoff — val_bpb 0.0180 (3-seed mean)	sofiabod	#1056
	0.0214	Record: 0.0214 bpb - Low Eval-Time Memory Regime: Packed Training N-gram Artifact + Learned Gate (No Phrase Cache)	AnirudhRahul	#962
	0.0235	Record: Packed N-gram + Dirichlet CTW — val_bpb 0.0235 (1xB200)	minh-stakc	#1114
	0.0274	Record: Order-16 Frozen N-gram Oracle + Learned Gate + TTT — val_bpb 0.0274 (3-seed mean)	TimPietrusky	#945
	0.0280	Order-16 Frozen N-gram Oracle + Score-First TTT (0.02801 BPB)	THUQiXuan	#924
	0.0281	Record: Frozen N-gram Oracle (Order-16) + Score-First TTT (0.02807 BPB)	THUQiXuan	#925
	0.0308	Order-13 N-gram Oracle + Score-First TTT (0.0308 BPB)	THUQiXuan	#883
	0.0498	Record: 0.0498 bpb - Packed Training N-gram Artifact + Learned Weighting Gate	AnirudhRahul	#931
	0.0638	Record: Fort Knox — Legal Packed Training Cache, Zero Val Adaptation (val_bpb 0.0638, 3-seed)	haikosys	#982
	0.0804	Record: CacheMoney — 0.0804 BPB (3-seed mean, std 0.00003)	haikosys	#933
	0.0830	Evidence-aware Dirichlet concentration, 35% improvement over fixed c=5.0	immartian	#1024
	0.0830	Record: Packed N-gram + Two-Pass Dirichlet CTW — val_bpb 0.0830 (3-seed mean)	sofiabod	#986
	0.0881	Record: 0.0881 BPB — 11L Int5 GPTQ + Order-12 N-gram + Phrase Cache + 65K Chunks	callithyia	#961
	0.0887	Record: Cache Is All You Need — val_bpb 0.0887, 622KB artifact (3-seed mean)	RoyiRa	#913
	0.0905	Record: Seed-Regenerated Random Model + Incremental N-gram Cache — val_bpb 0.0905	vimeto	#1095
	0.0935	Record: BROADSIDE — Full-Rescore N-gram Cache (val_bpb 0.0935)	simon-marcus	#870
	0.0939	Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean)	TimPietrusky	#921
	0.0942	Record: Fast Full-Rescore N-gram — val_bpb 0.09420444 (3-seed mean)	aamodbhatt	#888
	0.0960	Record: Two-Pass Order-12 Shared N-gram Tables — val_bpb 0.0960 (3-seed mean)	resouer	#907
	0.0972	Record: Order-14 N-gram Full-Rescore — val_bpb 0.0972	greqone	#922
	0.0990	Record: WaterLOO — Full-Rescore N-gram Cache with Self-Exclusion (val_bpb 0.0990)	simon-marcus	#881
	0.1003	Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean)	RoyiRa	#880
	0.1130	Record: Single-Pass Packed N-gram + Dirichlet CTW — val_bpb 0.1130 (3-seed mean)	sofiabod	#1030
	0.1154	Record: Order-20 Dirichlet Posterior + Phrase Cache — 0.11545 BPB (3-seed)	dentity007	#968
	0.1156	Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed)	dentity007	#948
	0.1156	Record: Two-Level Dirichlet Posterior Mixing with Per-Order OBCL -- 0.1156 BPB	Robby955	#900
	0.1181	Record: Budgeted Two-Pass N-gram Backoff — val_bpb 0.11814796 (3-seed mean)	aamodbhatt	#868
	0.1290	Record: N-gram Two-Pass Score-First Evaluation (0.1290 BPB)	THUQiXuan	#869
	0.1310	Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)	aryanbhosale	#893
	0.1315	Record: Two-Pass Order-12 N-gram Backoff + 256K Chunks — 0.1315 BPB	quietsmile	#853
	0.1434	Record: Two-Pass N-gram Rescoring (val_bpb 0.1434)	himanshudongre	#846
	0.1582	Record: 0.1582 BPB — Learned Mixer Head + No TTT + Matrix LR 0.03	bigbag	#859
	0.1653	Record: TurboQuant + Full-Rescore N-gram (val_bpb=0.1653)	haikosys	#918
	0.1663	Record: 0.1663 BPB - N-gram-Aware Training + Frozen N-gram Oracle + Backoff TTT	AnirudhRahul	#834
	0.2071	submission: Order-Adaptive N-gram Cache — 0.2071 BPB	RoyiRa	#851
	0.2282	[RECORD] L-BFGS SLOT + Entropy-Adaptive N-gram Mixer (0.2282 BPB)	ChideraIbe123	#1507
	0.2532	The Kitchen Sink	MichaelMcCulloch	#1111
	0.2834	Record: Order-12 N-gram Backoff + 256K Chunks — 0.2834 BPB	quietsmile	#843
	0.2841	Record: 11L Parallel Muon + N-gram Backoff Cache — val_bpb 0.2841 (3-seed mean)	aryanbhosale	#864
	0.2841	Record: 11L Parallel Muon + N-gram Backoff Cache — val_bpb 0.2841 (3-seed mean)	aryanbhosale	#865
	0.2873	Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks)	quietsmile	#840
	0.2951	Record: Order-9 N-gram Backoff + Score-First TTT + GPTQ-Int5 (0.2951 BPB)	himanshudongre	#826
	0.2952	Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)	AayushBaniya2006	#809
	0.2988	Score-First TTT + Causal N-gram (order=82) — val_bpb 0.29882 (3-seed mean)	renqianluo	#1605
	0.3212	Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT	callithyia	#850
	0.3461	10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB)	Bortlesboat	#912
	0.3461	10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB)	Bortlesboat	#916
	0.3509	Non-record: TurboQuant + N-gram Hybrid Eval + TTT (1xH100 NVL)	bsisduck	#1452
	0.3509	Non-record: TurboQuant + N-gram Hybrid Eval + TTT (1xH100 NVL)	bsisduck	#1454
	0.3693	Add non-record 16MB submission: Dirichlet PPM + Legal TTT on 8xH100	JDAppleseed	#1159
	0.3779	RFC: A framework for deciding the n-gram question	abaybektursun	#886
	0.3922	Normalized N-gram + Bayesian First-Match (val_bpb 0.3922)	Idan3011	#972
	0.3964	Record: Per-Sample SLOT + N-gram Order-22 + TTT + LR=0.432 — val_bpb 0.39642 (3-seed mean)	renqianluo	#1430
	0.4027	Record: 0.4027 BPB — Swarm-Designed Causal BackoffNgramMixer (3-seed mean, std 0.0015)	michaelwinczuk	#1094
	0.4118	Record Submission: HDC_1_Step_Grad_DSV_Radial_Slyvester_Hadamard_Matrix_Symmetry_Language_Model_val_bpb: 0.4118	viasky657	#1461
	0.4162	Record: 0.4162 BPB mixed quant ngram (post-fix reruns)	LucasErcolano	#1379
	0.4188	Record: 0.4188 BPB mixed quant ngram	LucasErcolano	#1359
	0.4311	Record: 0.4311 BPB - Complementary Training + Backoff N-gram Mixer + TTT	Naazimsnh02	#1033
	0.4377	Record: Complementary Training + Backoff N-gram Mixer — 0.4377 BPB	quietsmile	#811
	0.4380	V18 Manifold-Guided Architecture — val_bpb 0.434	raahilg	#663
	0.4405	Record: Order-Adaptive 9-gram Backoff + Distributed Prefill — val_bpb 0.4405 (3-seed mean)	sofiabod	#890
	0.4416	Record: 0.4416 BPB -- Complementary Training + Backoff N-gram Mixer	pentxayc	#803
	0.4820	Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820)	newjordan	#814
	0.4961	Bandit: ClownCar Crawler x Cubric Ngram9 — 0.4961 BPB, 9.9mb	newjordan	#1083
	0.5000	[Non-record] ABRAM_CHIP v2 — HECR int16 ultra compact — 34 KB — 0.50 bpb	abrahaw123-cell	#475
	0.5116	Add Nairi submission: 9L 512D vocab1024	SlavH	#1723
	0.5440	Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440)	hypery11	#825
	0.5466	Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)	travispchen	#798
	0.5527	[record] add HWNODE record - 0.5527	lucamignatti	#818
	0.5588	parameter golf submission - Julius	magicjulio	#722
	0.5601	Record: Chimera TTT — K-Projection LoRA + Min-NLL (0.5601 BPB, 3-seed mean)	teddyoweh	#611
	0.5644	Record: X-WING — Shared N-gram Tables + Cubric (val_bpb=0.5644)	newjordan	#800
	0.5755	Add Conker-5 tandem residual exact experts non-record submission	asuramaya	#998
	0.5863	10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB)	Bortlesboat	#876
	0.6361	Record: Per-Sample SLOT + TTT + LR=0.024 + Stride=96 — val_bpb 0.63614 (3-seed mean)	renqianluo	#1328
	0.6361	Record: Per-Sample SLOT + TTT + LR=0.024 + Stride=96 — val_bpb 0.63614 (3-seed mean)	renqianluo	#1329
	0.6364	Record: 0.6364 BPB - Depth Recurrence + Multi-Order N-gram Backoff	Naazimsnh02	#808
	0.6430	Record: DeepQuant V10b — 11L INT6 + 8ep LoRA TTT (val_bpb=0.6430)	AriaAnima	#596
	0.6567	Record: 0.6567 BPB — Prefill Cache + 7-Gram Entropy-Adaptive + EBLS	Robby955	#796
	0.6580	Record: Trinity SLOT v3 + Pre-Quant TTT — val_bpb 0.65802 (3-seed mean)	deborahnelson8788726	#1722
	0.6671	Record: BackoffNgramMixer (mean val_bpb=0.6671)	hypery11	#813
	0.6672	Record: 11L + Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.6672)	minh-stakc	#770
	0.6678	Record: Backoff N-gram Cache + LeakyReLU(0.9)² (val_bpb=0.6678)	ibarrajo	#806
	0.6683	Record: BackoffNgramMixer + Drift-Free TTT (3-seed mean val_bpb=0.6683)	deanbrr	#779
	0.6846	Notable Non-Record: H-Net Tokenization — 0.6846 BPB — Hierarchical Token Processing	gowtham0992	#1121
	0.6864	Record: 0.6864 BPB — K-LoRA + Min-NLL + FlashAttention-3	bigbag	#614
	0.6951	Record: 11L LeakyReLU² XSA-all GPTQ-AR SLOT64 — 0.6951 BPB	canivel	#1319
	0.7093	Non-record: Discriminative TTT + SLOT-24, 3-seed verified (8xH100 SXM)	Abhishek8108	#1414
	0.7094	Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean)	stukenov	#1376
	0.7139	Record: LeakyReLU(0.5)² + Legal Per-Document LoRA TTT + GPTQ-lite (mean val_bpb=0.7139, 3 seeds)	robinojw	#762
	0.7227	Record: 0.7227 BPB — 10L LoRA TTT 6ep + FlashAttention-3	bigbag	#605
	0.7406	Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)	anthony-maio	#1321
	0.7614	ClownCar: Frugendorff compression baseline + canonical DeltaNet integration	newjordan	#990
	0.7625	[Non-record] Megakernel Saturation Study: 5 Triton fusion variants cannot beat torch.compile at 27M scale	ChideraIbe123	#1679
	0.7853	Record: PROTEUS v8 — 11L INT6 + LoRA TTT 5ep cosine (mean val_bpb=0.7853, 4 seeds)	MatoTeziTanka	#568
	0.8004	Non-record (WIP): Multi-Order N-gram Backoff — val_bpb=0.8004 (1xH100 proxy)	greqone	#871
	0.8005	Record: PR #1873 base + tuned PPM gate (T=0.7/H=0.99/L=0.3) — val_bpb 0.80051 (3-seed mean)	joshuaswanson	#2098
	0.8100	Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission	alons23	#216
	0.8104	Medusa: Unstable — DeltaNet Crawler 0.8104 BPB 10mb file size(best seed), mean 0.9984, Frugendorff continuation	newjordan	#1028
	0.8128	0.8128 BPB: Classical Compression Eval + N-gram Backoff on PR #549 Base	shinegami-2002	#786
	0.8173	Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173)	minh-stakc	#642
	0.8201	Record: SP10240 Casefold + TTT + GPTQ + PPM-D — val_bpb 0.82005771 (3-seed mean)	schattenjuwel	#1873
	0.8265	Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean)	ndokutovich	#1488
	0.8275	Record: val_bpb 0.8275 (3-seed mean) — SLOT-28 + VRL + QK-Gain 4.0 + XSA-11	yahya010	#1324
	0.8295	Add 16MB SP1024 Value Residual + PPM mixture submission ppm_mix_bpb 0.829467	lkk688	#2103
	0.8335	Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression	squ11z1	#1927
	0.8335	Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean)	Karen042009	#1901
	0.8502	val_bpb 0.85018 (3-seed mean) Raphe_II	newjordan	#1878
	0.8503	non-record submission: mean-delta warm start + depth recurrence for SLOT (0.8503 BPB)	JKSNS	#1368
	0.8508	PROTEUS+STYX — val_bpb 0.8508 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache	MatoTeziTanka	#769
	0.8533	val_bpb 0.85330 (3-seed mean) Leo	newjordan	#1871
	0.8609	Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609)	sunnypatneedi	#909
	0.8609	Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609)	sunnypatneedi	#963
	0.8637	Record: SLOT-24 Aggressive — val_bpb 0.8637 (3-seed mean)	anthony-maio	#1313
	0.8655	val_bpb 0.86548 (3-seed mean) Mikey	newjordan	#1848
	0.8705	Record: 11L EMA + GPTQ-lite + LeakyReLU^2 + QAT@0.15	NandhuRajRK	#926
	0.8721	val_bpb 0.87206 (3-seed mean) Raphe	newjordan	#1846
	0.8748	val_bpb 0.87480 (3-seed mean) Donnie	newjordan	#1865
	0.8822	(0.8822 BPB mean) Medusa: Unstable S2 — DeltaNet Crawler, Legal 10mb. .77bpb single seed.	newjordan	#1047
	0.8881	Record: 11L + order-adaptive 11-gram (mean val_bpb=0.8881)	hypery11	#795
	0.8960	Record: 7-gram N-gram Cache (0.8960 bpb)	armantsaturian	#797
	0.9019	Record: PR #1797 base + PPM-D byte mixture (v2, full-val coverage) — mix_bpb 0.9019 / quantized_ttt 1.0621	ndokutovich	#1881
	0.9024	Record: PR #1797 base + PPM-D byte mixture — val_bpb 0.90236 (3-seed mean)	ndokutovich	#1854
	0.9059	Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)	hypery11	#788
	0.9069	WIP Record: SP8192 + CaseOps + Depth Curriculum + FreqGPTQ + PPM adaptive-λ mixture — val_bpb 0.90687688 (1-seed)	pragnyanramtha	#1833
	0.9076	Record: 0.9076 BPB — 10L + N-gram Backoff + Matrix LR 0.03	bigbag	#828
	0.9123	10L + Multi-Order N-gram Backoff (0.9123 BPB)	Bortlesboat	#802
	0.9209	Non-record 1xH100 backoff7gram zlib-fallback sign-of-life (val_bpb 0.9209)	RichiiiTV	#767
	0.9258	Record Submission: 0.9258 BPB — Kitchen Sink (7-gram + XSA6 + BigramHash4K + Cosine TTT)	agalimova	#776
	0.9297	SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB	teslaeco	#2076
	0.9300	Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300)	resouer	#1229
	0.9354	Record: 11L LeakyReLU² + XSA-all + QK-Gain 4.0 + Full GPTQ + SLOT — val_bpb 0.9354 (3-seed mean)	xexyz	#1263
	0.9362	Podracing III: Cubric Lite — 0.9362 BPB	newjordan	#782
	0.9370	Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370)	travispchen	#774
	0.9393	Record: EMA-GPU + Multi-Order N-gram Backoff + PE Confidence (val_bpb=0.9393)	Idan3011	#810
	0.9417	Record: Scylla QK5.25 depth recurrence, val_bpb 0.94166	djeidy	#1813
	0.9418	Record: SP8192 CaseOps v13 PPM tuned gate — fresh 3-seed mean 0.94175270	NewyorkDev	#2083
	0.9418	Record: 0.9418 BPB — BigramHash + MuonEqR + 3L-Recurrence + SDClip (3-seed mean)	GrishaKhumaryan	#1903
	0.9429	Record: SP8192 + Byte-PPM Mixer with Tuned Order/Gate (O=5) — val_bpb 0.94290 (3-seed mean)	joshuaswanson	#1991
	0.9443	Record: LeakyReLU(0.5)² + Per-Document LoRA TTT (mean val_bpb=0.9443, 3 seeds)	robinojw	#620
	0.9457	Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569	davie2009kh	#1929
	0.9462	Record: SLOT + QK-Gain 4.0 + XSA-11 — val_bpb 0.9462 (3-seed mean)	anthony-maio	#1303
	0.9485	Record: Scylla + Full GPTQ + XSA-all + FA3 — val_bpb 0.9485 (3-seed mean)	icryo	#1184
	0.9512	Record: PROTEUS v7 — 11L INT6 + LoRA TTT (mean val_bpb=0.9512, 3 seeds)	MatoTeziTanka	#512
	0.9516	Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 0.95165 (full-val, 3-seed)	OE-GOD	#1795
	0.9581	Record: Score-First TTT + N-gram Backoff (3-seed mean val_bpb=0.9581)	Asukabot0	#761
	0.9581	Record: Score-First TTT + Multi-Order N-gram Backoff (3-seed mean val_bpb=0.9581)	antaloaalonso	#940
	0.9588	[Val Only]: MLP 3x + STE int6 QAT + sliding window, val_bpb=0.9588	andrewgcodes	#120
	0.9605	Record: 11L Full GPTQ + Multi-Order N-gram Backoff (fixed-alpha 0.9757 / entropy-adaptive 0.9605, 3-seed)	raahilshah	#778
	0.9623	Record: 0.9623 BPB — 7-Gram Entropy Cache + XSA-all + EBLS	Robby955	#777
	0.9625	Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)	newjordan	#753
	0.9626	Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)	someone114514	#1877
	0.9627	Exp/balanced 0.9627	rixhavraj	#2153
	0.9631	Record: 11L XSA + Mixed INT6 + Adaptive N-gram Cache (2->7 backoff) - val_bpb=0.9631, 3-seed	aerosta	#993
	0.9633	Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633)	ndokutovich	#764
	0.9641	[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache	skoustav35	#1185
	0.9642	Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean)	anthony-maio	#887
	0.9642	Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean)	anthony-maio	#889
	0.9642	Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff	anthony-maio	#915
	0.9650	Record: Trinity Ternary GPT — val_bpb 0.9650 (ternary roundtrip)	deborahnelson8788726	#1246
	0.9674	Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed)	Asukabot0	#727
	0.9693	SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)	sanyalsunny111	#1055
	0.9698	Non record: Progressive context growth precursor to PR 2014, 12 hours on RTX 4090, val_bpb 0.9697 pre-quant	simonbissonnette	#2144
	0.9727	Record: BIJEPAX-lite JEPA + SP8192 CaseOps PPM — val_bpb 0.97271	NewyorkDev	#2080
	0.9789	Record: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine leaky* TTT)	lukacf	#517
	0.9796	Record: SP8192 + Sliding-Window Eval + Lock-In Byte Mixer - val_bpb 0.979556	anmarhindi	#2138
	0.9850	Record: Cosine TTT + Multi-Order N-gram Cache (3-seed mean val_bpb=0.9850)	andrewbaggio1	#741
	0.9850	Non-record: Negative Results — Architecture, TTT Variants, Quantization, and N-gram Cache Illegality	andrewbaggio1	#1186
	0.9901	MDLM Diffusion — val_var_bpb 0.9901, EOS learning + full dataset shard rotation, 33M params, 1x AWS A10G	aiejvn	#1241
	0.9915	Record: SP8192 + yahya010 NN base + byte-PPM mixer — val_bpb 0.99145 …	deborahnelson8788726	#1933
	0.9917	Record: 11L XSA-all + backoff 7-gram (mean val_bpb=0.9917)	hypery11	#763
	0.9944	Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)	leon2k2k2k	#1885
	0.9946	Record: Score-First TTT + PPM-D Byte Mixture — mix_bpb 0.9946 (3-seed mean)	G3sparky	#1858
	0.9958	Record: LeakyReLU(0.9)² + N-gram Cache + Entropy-Reg QAT — val_bpb 0.9958 (3-seed mean)	lolrazh	#885
	0.9962	Record: SP8192 + 3-layer recurrence + byte-PPM mixer — val_bpb 0.99621 (3-seed mean)	remg1997	#1959
	0.9982	Record: Nairi-Micro - 0.9982 BPB	SlavH	#1789
	1.0014	Record: SP8192 + PPM-D byte mixture — 1.00136 BPB (3-seed mean)	anmarhindi	#1835
	1.0030	Non-record: 10L Gated DeltaNet (PureGDN) — val_bpb 1.003028 (3-seed mean, legal TTT)	Christopher-Lee-McClendon	#1370
	1.0046	Record: L-BFGS Causal SLOT — val_bpb 1.0046 (3-seed mean)	resouer	#1350
	1.0050	Record: SP8192 + Strict Full-Val Byte PPM Mixture — 1.00495 BPB (3-seed mean)	someone114514	#1850
	1.0050	V20: Cascaded 2-Phase L-BFGS Causal SLOT (1.00497 BPB, 3-seed)	Bortlesboat	#1372
	1.0066	Record: SP8192 + Headwise Gate + EMA 0.990 + Small Batch (1.0066 BPB, 3-seed)	jamesEmerson112	#2071
	1.0095	Record: TTT-AdamW + SLOT L-BFGS25 LogitDelta + GPTQ DAMP=0.005 — val_bpb 1.00955	renqianluo	#1318
	1.0098	Record: GatedDeltaNet FLA + Score-First TTT + Brotli — val_bpb 1.00980 (3-seed mean)	aamodbhatt	#1711
	1.0099	Record: GatedDeltaNet (FLA) + Legal Score-First TTT — val_bpb 1.00995 (3-seed mean)	arsenis-cmd	#1698
	1.0108	Record: GatedDeltaNet + Legal TTT + Brotli-11 — val_bpb 1.01080 (3-seed mean, VALID artifacts)	yahya010	#1734
	1.0116	Non-record: Sequential Momentum TTT (val_bpb=1.0116, 3-seed mean, 4xA10G)	connectwithprakash	#807
	1.0119	Record: GDN-Hybrid + TMA Megakernel + Brotli-11 — val_bpb 1.01195 (3-seed mean)	andrewbaggio1	#1672
	1.0135	Record: PreQuantTTT + Sliding Window on PR #1855 stack, val_bpb=1.01355 (3-seed)	okezue	#1958
	1.0167	Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 (cold-cache, 1.01671 BPB)	joshkmartinez	#1575
	1.0167	Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean)	joshkmartinez	#1576
	1.0171	Record: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB)	joshkmartinez	#1564
	1.0171	Record: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB)	joshkmartinez	#1622
	1.0190	Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean)	aamodbhatt	#1712
	1.0192	Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 1.01925 (3-seed)	OE-GOD	#1785
	1.0205	Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention) - quantized_bpb 1.02046	joshkmartinez	#1562
	1.0205	Record: GDN-Hybrid (Gated DeltaNet + Sliding Window Attention)	joshkmartinez	#1563
	1.0208	GDN-Hybrid: Fix warmdown/SWA/QAT timing — 1.0208 BPB	OE-GOD	#1681
	1.0217	SOTA Attempt: Paid prefix (val_bpb=1.0238)	spokane-way	#168
	1.0222	Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222, 3-seed mean)	stukenov	#745
	1.0226	New Record: Pure Neural GDN 1.0226 BPB (shalyhinpavel)	shalyhinpavel	#875
	1.0244	Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish)	lukacf	#702
	1.0270	Record: SP8192 + Sliding-Window Eval + Conditional-PPM Byte Mixer Full-Val - val_bpb 1.027004	anmarhindi	#2039
	1.0274	Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)	Hkoyuer	#1632
	1.0277	Record: PR #1738 + PreQuant TTT LR=1e-3 + Unfrozen — val_bpb 1.02767 (3-seed mean)	kilojoules	#1758
	1.0278	Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0278, 3-seed mean)	stukenov	#733
	1.0282	Record: Pre-Quant TTT + Void Compass — val_bpb 1.0282 (3-seed mean)	G3sparky	#1852
	1.0283	Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.028308 (3-seed cold-cache mean)	Abhishek8108	#1544
	1.0283	Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.028308 (3-seed cold-cache mean)	Abhishek8108	#1545
	1.0293	Record: SP8192 + Sliding-Window Eval + Conditional-PPM Byte Mixer - val_bpb 1.029282	anmarhindi	#2032
	1.0321	Gravity Tokenizer: 1.0321 BPB via ablation leverage vocabulary optimization	dcrow85	#755
	1.0322	Record: PR #1787 base + PPM-D OMP byte mixture (val_bpb 1.0322 3-seed mean)	dexhunter	#1857
	1.0324	Report: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain	leon2k2k2k	#1905
	1.0337	Record: XSA-all + LeakyReLU² + VR + GA + 7-gram cache (val_bpb=1.0337)	Asukabot0	#715
	1.0339	Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)	genji0306	#1705
	1.0339	Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)	genji0306	#1791
	1.0340	11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB)	xexyz	#792
	1.0354	Record: PR #1735 + CaseOps Tokenizer V15 (val_bpb 1.03540, mean of 3 seeds)	alertcat	#1738
	1.0354	{RECORD} CaseOps pre-quant TTT record (1.0354 BPB)	dttdrv	#1911
	1.0362	Record: 1.0362 BPB — SGD Momentum 0.95 TTT + HedgeMixer + Per-Layer LR	dexhunter	#995
	1.0365	Record: 8L Paid Prefix + Sparse Hard Blocks (1.0365)	nicolasdickenmann	#278
	1.0366	Record: Chained TTT — Cosine Recovery + Multi-Pass Scoring (3-seed mean val_bpb=1.0366)	andrewbaggio1	#685
	1.0398	Record: SP10240 SimCTG + PreQuantTTT — 1.03983 sliding-window (3-seed)	BharathSShankar	#1972
	1.0399	[Non-record] Long-Train Artifact Scaling: post-TTT BPB = 1.0399, artifact size constant across 10–60 min	Christopher-Lee-McClendon	#1979
	1.0400	Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA	pentxayc	#731
	1.0409	Record: K_KVShare_Wider full-recipe FLA — val_bpb 1.04090 (3-seed mean)	resouer	#1687
	1.0429	Record: SP8192 + Parallel Pre-Quant TTT — val_bpb 1.0429 (3-seed mean)	AjAnubolu	#1735
	1.0435	Record: 1.0435 Gated XSA + token-only n-gram tilt + LQER top-1 + AWQ-lite + AsymLogit) with GPTQ_RESERVE_SECONDS=2.0 and corrected CaseOps data preparation	aquariouseworkman	#2118
	1.0449	[Non-Record] 4h Long-Train Scaling: Quantized BPB 1.0449	Christopher-Lee-McClendon	#2008
	1.0450	Record: 1.0450 BPB — SGD TTT + HedgeMixer with Per-Layer LR Groups	dexhunter	#967
	1.0461	Podracing: 1.0461 BPB (3-seed mean)	newjordan	#674
	1.0461	Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²	newjordan	#706
	1.0462	Record: Gated XSA + LQER top-1 + strict in-timer n-gram TTT (val_bpb: 1.046)	simon-marcus	#2018
	1.0465	Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)	hypery11	#758
	1.0467	E2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB)	gowtham0992	#872
	1.0467	E2E TTT: End-to-End Test-Time Training with Meta-Learning (1.0467 BPB)	gowtham0992	#873
	1.0483	Closed: superseded non-record draft	himanshudongre	#2108
	1.0487	Record: pcloadloveletter v6 — Novel Codebook+Huffman Compression + AdamW TTT (val_bpb=1.0487)	NotADevIAmaMeatPopsicle	#532
	1.0500	Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312	SlavH	#2009
	1.0511	Record: SP8192 Full Stack + Headwise Gated Attention + PreQuantTTT (1.0511 BPB, 3-seed)	jamesEmerson112	#1992
	1.0523	Record: 11L XSA4 + Multi-Pass Streaming Score-First Legal TTT (3-seed mean val_bpb=1.0523)	Sarimsaljook	#573
	1.0539	Record: 8L Paid Prefix + SmearGate + Int6 (val_bpb=1.0539)	ibarrajo	#262
	1.0539	Non-record: Paid Prefix Research (val_bpb=1.0539, ruled out-of-scope)	ibarrajo	#275
	1.0540	Predicted val_bpb ~1.054 on PR #2014 base — Gated XSA + Reverse-Chol GPTQ + Leaky 0.3 stack (code complete, asking for compute to verify)	anderamondarainh-stack	#2054
	1.0541	Record Submission: 1.0541 BPB - 5-expert Hedge Mixer + CROWN-Q + stride=64	RoyiRa	#700
	1.0543	Competition Submission - jj6 Eval Trick Stack - 1.0543 BPB	Wilbatronic	#2013
	1.0554	Submission: Asymmetric Logit Rescale + cap-fit bit allocation (PR #2140 fork) [draft, pending 8×H100]	vimeto	#2164
	1.0556	Non-record: PR #2135 + MP3 marker-pair fusion	izlley	#2158
	1.0560	Record: PR #2014 stack + LeakyReLU 0.3 + strict in-timer n-gram TTT (val_bpb 1.0560)	simon-marcus	#2140
	1.0567	Record candidate: 1.05670 BPB — token-only n-gram tilt + AsymLogit + #2060 levers + NUM_PHASES=1	TanishGudise	#2130
	1.0567	Record candidate: PR #2130 base + GATE_WINDOW=8	codemath3000	#2134
★	1.0567	Record candidate: PR #2130 base + GPTQ_CALIBRATION_BATCHES=32	codemath3000	#2135
	1.0567	Record candidate: PR #2130 base + GATE_WINDOW=8 + GPTQ_CALIBRATION_BATCHES=32	codemath3000	#2136
	1.0569	Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)	jorge-asenjo	#2041
	1.0573	Record: Casefold V4 + AttnOutGate + Multi-Phase Global SGD TTT — val_bpb 1.05733 (3-seed mean)	dexhunter	#1693
	1.0574	Record: 11L Sidecar48 + Enhanced Attention + Async Data Pipeline + AdamW TTT (20 epochs, cosine LR, 3-seed mean val_bpb=1.0573)	DeepReinforce	#684
	1.0575	[Record candidate] TTT Peer-LoRA Ensemble on PR #2014, val_bpb = 1.05749	varunneal	#2139
★	1.0576	Record: PR1855/PR1953 base + Progressive context growth (val_bpb: 1.05759, 3-seed)	simonbissonnette	#2014
	1.0576	Recurrent Transformer RT-KV experiment	Maheshram1	#2034
	1.0576	Record candidate: PR #2014 base + GATE_WINDOW=8	codemath3000	#2131
	1.0576	Record candidate: PR #2014 base + GPTQ_CALIBRATION_BATCHES=32	codemath3000	#2132
	1.0576	Record candidate: PR #2014 base + GATE_WINDOW=8 + GPTQ_CALIBRATION_BATCHES=32	codemath3000	#2133
	1.0577	SR-CM-P2Loss: 1.0577 bpb (~15.06MB)	estesryan	#1180
	1.0579	Record: LongCtx No-QV QK5.25 + AsymLogit + LQER g32/top4 + TTT-local 0.80 — 1.05792 BPB 3-seed mean	S0urC10ud	#2060
	1.0580	Support: independent PR2014 prefix-2400 reproduction, seed 42 val_bpb 1.05804	hi-aduek	#2078
	1.0581	Follow-up: LongCtx No-QV Prefix3500 GlobalTTT LR 0.0008 — 1.05807 BPB seed42	someone114514	#2100
	1.0583	SP8192 + LongCtx NoQV QK5.25 Prefix2750 — 1.05827 BPB (seed 42)	someone114514	#1963
	1.0584	Record: AWQ-lite + AsymLogit + GradCentral + ... val_bpb=1.05845	OnlyJundong	#2101
	1.0585	record: AWQ-lite + AsymLogit + GradCentr + LabSmooth - 1.05846 BPB	OnlyJundong	#2097
	1.0585	Record: 1.05847 no_qv TTT + AWQ-lite + AsymLogit + long-context eval (3-seed)	aquariouseworkman	#2019
	1.0585	SAFE_SUBMISSION: run036-safe016 (1.05850 BPB)	joshkmartinez	#1624
	1.0585	SAFE_SUBMISSION: run036-safe016 (1.05850 BPB)	joshkmartinez	#1633
	1.0585	Record: V21 + N-gram Tilt + LeakyReLU 0.3 — val_bpb 1.05851 (3-seed mean)	ndokutovich	#1967
★	1.0586	Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean)	andrewbaggio1	#1953
	1.0586	Non-record: PR1953 K+O-only TTT + QK_GAIN_INIT=5.35	dexhunter	#2119
	1.0587	[non-record] Stabilized phased TTT LR retune - H200 screening val_bpb 1.05868	ayushozha	#2033
	1.0587	Record: SP8192 + Pre-Quant AdamW TTT + Compiled TTT — val_bpb 1.0587 (3-seed mean)	translatingthename	#1539
	1.0587	Non-record: Pre-Quant AdamW TTT (Compiled) + SP8192 + Depth Recurrence — val_bpb 1.0587 (3-seed mean)	translatingthename	#1550
	1.0587	Add SP8192 + ParResid + DR + LoRA TTT + Mixed int4/int6/int8 + AWQ su…	dev-pratap-singh	#1919
	1.0587	Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874	okezue	#2129
	1.0587	Record candidate: long-context no-QV rank56/prefix3000 TTT — val_bpb 1.05875	himanshudongre	#1965
	1.0588	Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…	JulianTang2027	#2117
	1.0590	RecordLongCtx No-QV QK5.25 + AsymLogit — 1.05899 BPB 3-seed mean	Elubrazione	#2006
	1.0590	Record: LongCtx No-QV QK5.25 + AsymLogit — 1.05899 BPB 3-seed mean	Elubrazione	#2007
	1.0591	Add LR0.85 prefix2750 legal TTT record	ZanePeycke	#2047
	1.0592	Track 2: PR #1855 + MP3 marker-pair fusion + alias smear boundary (val_bpb 1.05917838, 3-seed avg)	izlley	#2109
	1.0592	Add shortchunk16 TTT record candidate	jiashenggu	#2067
★	1.0593	Record: PR #1908 base + AWQ-lite + Asymmetric Logit Rescale - val_bpb 1.05932 (3-seed mean)	alertcat	#1945
	1.0593	Record: CaseOps Gated XSA NgramTilt LQER \| val_bpb=1.05933439	vaibhavmishra1	#2123
	1.0593	Record : CaseOps Gated XSA NgramTilt LQER \| val_bpb=1.05933439	vaibhavmishra1	#2124
	1.0596	Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed)	Itssshikhar	#2020
	1.0597	Record: Casefold V4 Tokenizer + Multi-Phase Global SGD TTT — val_bpb 1.05970 (3-seed mean)	dexhunter	#1670
	1.0599	Record support: canonical top-stack reproduction - val_bpb 1.05985	deborahnelson8788726	#2031
	1.0599	Record: SP8192 CaseOps + TTT + GPTQ + LRZIP — val_bpb 1.05993 (3-seed mean)	liujshi	#1934
	1.0600	Record candidate: PR #1855 + TTT_LORA_RANK=56 — val_bpb 1.05997 (s42)	vimeto	#1935
	1.0600	Record: SP8192 + Recur345 + Par7 + EMA + QK5.25 + Pre-Quant TTT 10ep — val_bpb 1.0600 (3-seed mean)	ndokutovich	#1487
	1.0600	Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)	Christopher-Lee-McClendon	#1950
	1.0602	Non-record: single-seed LengthAwareTTT validation package	unowenmaxwen	#1984
	1.0603	Record: SP8192 + NEFTune + Z-Loss + Phased-TTT (4 phases, prefix=3000, LoRA-128) — val_bpb 1.06035 (3-seed mean)	uniagent-alpha	#2162
	1.0603	Record: SP8192 + NEFTune + Z-Loss + Phased-TTT (4 phases, prefix=3000, LoRA-128) — val_bpb 1.06035 (3-seed mean)	uniagent-alpha	#2163
	1.0604	Record candidate: PR #1797 + AWQ-lite top3 + LQER 60k on b180-tlr56 — val_bpb 1.06043 (seed=0)	vimeto	#2157
	1.0604	Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean)	AayushBaniya2006	#1956
	1.0605	Record: PR #1908 base + GPTQ module-damp + Asym Logit Rescale — val_bpb 1.06048 (3-seed mean)	dexhunter	#2051
	1.0607	Exp4 submission	hectar-glitches	#1981
	1.0608	Record: PR #1855 base + Smear + LQER + LogitCalib + Phased TTT — val_bpb 1.06080 (3-seed)	dexhunter	#1924
	1.0608	Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean)	romeerp	#1908
	1.0608	Add SP8192 CaseOps + 2560 no-Q/V Adaptive Hedge n-gram (1.06083 BPB)	AidenGeunGeun	#2050
	1.0609	RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)	aquariouseworkman	#1918
	1.0609	Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)	aquariouseworkman	#1946
	1.0610	Record candidate: StageB v2 CaseOps TTT seed42 1.06099764	Kbediako	#2121
★	1.0611	Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean)	codemath3000	#1855
	1.0611	Non-record: Use-Theoretic Embeddings (Wittgenstein PI §293)	Armigerous	#2017
	1.0611	[DRAFT] Record: Variable-Rank LQER + Muon-TTT — val_bpb TBD	RahimMirani	#2026
	1.0611	Record candidate: CaseOps + Matrix-LR 0.028 + Phased TTT 3500	simon-marcus	#1925
	1.0612	Non-record: PR #1797 reproduction + EMBED_CLIP relax + 5 ablation studies	Fija	#1914
★	1.0613	Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT	aquariouseworkman	#1851
	1.0614	Record: PR #1797 reproduction — val_bpb 1.06136 (3-seed mean)	AayushBaniya2006	#1906
	1.0614	[Non-record] PR #1855 + SWA blend — val_bpb 1.06140 (3-seed)	IshanG97	#2049
	1.0615	Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)	Christopher-Lee-McClendon	#1868
	1.0615	Experiment: SmearGate BOS Fix + train-only logit calibration	someone114514	#1884
	1.0616	Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157	dexhunter	#1797
	1.0616	SP8192 + SLOT-4 + TTT + 3-Layer Recurrence + Parallel Residuals (1.0616 BPB)	powerpratik	#1647
	1.0617	Record: PR1797Base + HadamardRotation + ValueResidualLearning - val_bpb 1.06172	jayaram1125	#2068
	1.0618	Record: MHA Path + 1855 9-hparam Stack + PR #1948 + PR #1855 (val_bpb = 1.06184, 3-seed)	TimS-ml	#1987
	1.0622	Record: 11L XSA4 + LeakyReLU(0.5)² + Cosine TTT 50ep (val_bpb=1.0622)	sofiabod	#518
	1.0624	Record: Leaky ReLU Slope + GPTQ Reverse-Cholesky Speedup + PR #1938 (val_bpb = 1.06242)	TimS-ml	#1948
	1.0624	non-record - mockingbird - sota copy/10kvocab	newjordan	#2120
	1.0629	Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287	leon2k2k2k	#1798
	1.0629	Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287	leon2k2k2k	#1800
	1.0629	Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287	leon2k2k2k	#1801
	1.0631	Record candidate: PR #1855 + Adaptive Hessian-Sensitivity GPTQ Clip — val_bpb 1.06310 (3-seed mean)	chris-colinsky	#1962
	1.0632	Record: Depth Recurrence + Banked Muon + Pre-Quant TTT (18ep) — val_bpb 1.0632 (3-seed mean)	RulinShao	#1517
	1.0634	Non-record: Negative Results Compendium — 14 failed experiments on PR-1493→PR-1787	nprime06	#2046
	1.0636	Non-record: notes on the recurrence band (mixing parameters, MLP sizing, loop sizing)	leon2k2k2k	#2137
★	1.0638	Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE — val_bpb 1.06378	nprime06	#1787
	1.0639	Non-record: final frontier autopsy	himanshudongre	#2110
	1.0639	Record: Casefold Tokenizer + Parallel Residuals + Systems Optimization — val_bpb 1.0639 (3-seed mean)	codemath3000	#1585
	1.0642	Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421	leon2k2k2k	#1779
	1.0643	[Non-record] 1.1999 BPB on a single H100 GPU (~1.064 BPB on 8x): Adaptive Recurrent Transformer Architecture	TidalTunes	#2107
	1.0644	Non-record: Inhibitory layer on PR #1851 stack (val_bpb 1.06438)	cloud-777-boy	#2116
★	1.0645	Record: SP8192 CaseOps stack retune (MLP clip 10→12) → 1.06453	dexhunter	#1769
	1.0650	Add SP8192 CaseOps + legal per-document TTT record (1.0650 BPB)	AidenGeunGeun	#1915
	1.0651	Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505	romeerp	#1756
	1.0651	Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean)	bigbag	#1771
★	1.0655	Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549	dexhunter	#1736
	1.0655	SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT	tashapais	#1766
	1.0658	Record: SP8192 #1855 Base + Asymmetric Logit Rescale — val_bpb 1.06577 (3-seed mean)	jorge-asenjo	#1923
	1.0661	Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)	X-Abhishek-X	#1898
	1.0665	Add BOS-masked SmearGate record	ZanePeycke	#1869
	1.0668	Add SP8192 apex over-cap evidence submission	PrzemyslaV88	#2082
	1.0668	Record: Custom Casefold Tokenizer — 1.0668 BPB	mikeapedia	#1578
	1.0672	Record: SwiGLU + XSA4 + U-Net + AdamW TTT (3-seed mean val_bpb=1.0672)	JoeProAI	#462
	1.0674	Ablation: WiderGate32, RoPE dims, activation slopes, hparam stack (8xH100)	bsisduck	#1970
	1.0677	Record: PR #1790 + Polar Express NS + MIN_LR + LQER Asym Rank-4 — val_bpb 1.06766 (3-seed mean)	AjAnubolu	#1874
	1.0677	Non-record: Post-Quantization LoRA Distillation (LCQ) on PR #1855 stack, val_bpb=1.06767	okezue	#2128
	1.0678	Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)	robbiebusinessacc	#1883
★	1.0678	Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean)	romeerp	#1729
	1.0679	Record: SP8192 + 3-Layer Depth Recurrence + Parallel Residuals + EMA + QK5 + Pre-Quant AdamW TTT — val_bpb 1.0679 (3-seed mean)	ndokutovich	#1485
	1.0684	Record: SP8192 PR #1874 + Optimized Hyperparameters — val_bpb 1.06844 (3-seed mean)	bigbag	#1926
	1.0687	Record: PR #1886 base + per-block MLP output gate (Linear, weight-learnable) — val_bpb 1.06872 (3-seed mean)	MarioPaerle	#1941
	1.0690	Record: LoRA-TTT chunk size 36 — val_bpb 1.06900 (3-seed mean)	renqianluo	#1966
	1.0696	Record: Fused softcap CE + WD=2.0 (warm-start stability fix) — val_bpb 1.06957 (3-seed mean)	renqianluo	#1886
	1.0698	Record: 11L Sidecar48 + Enhanced TTT (cosine LR, 20 epochs) — 1.0698 BPB (3-seed mean)	teddyoweh	#581
	1.0699	Record: SP8192 PR #1874 + TTT_CHUNK_SIZE=32 — val_bpb 1.06990 (3-seed mean)	bigbag	#1920
	1.0699	Record: SP8192 + SmearGate + AttnOutGate(w24) + LoRA-TTT Improvements + Phased TTT — val_bpb 1.06991 (3-seed mean)	miaoyuxun	#1790
	1.0700	Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192	GodlyDonuts	#1909
	1.0701	Record: Polar Express NS + MIN_LR + GatedAttn + Alpha LoRA — val_bpb 1.07006 (3-seed mean)	renqianluo	#1792
	1.0702	Record: Pair-Geometric Value Projection on PR1855	deusexnatura	#2075
	1.0704	Record attempt: SP8192 + 3-Epoch Parallel Pre-Quant TTT + Huber WD Muon (SDPA-friendly) — val_bpb 1.07037 (3-seed mean)	davie2009kh	#1807
	1.0705	Record: AdamW TTT 30ep Cosine + Per-Layer LR (val_bpb: 1.0705)	sunnypatneedi	#771
	1.0706	[non-record / wishlist] E2E TTT — full-model SGD per chunk, val_bpb 1.07063, demonstrates "healing property"	X-Abhishek-X	#1837
	1.0708	Record: GatedAttn + Alpha-Scaled LoRA + Warm-start A + WD 1.0 — val_bpb 1.07081 (3-seed mean)	renqianluo	#1784
	1.0713	Record(?): WARP (Word-Aware Representation Priors) — val_bpb 1.0713 \| 1xH100 10min \| 13.65 MB	ahmetdenizyilmaz	#1252
	1.0713	S0/PR1851 + Cap Tokenizer + LQER + Global TTT (val_bpb = 1.0713)	lijuncheng16	#1938
	1.0713	Non-record submission: post-deadline CaseOps + SparseAttnGate + Phased TTT (1.07134 BPB)	upascal	#2143
★	1.0714	RECORD: SmearGate + Attention Output Gate + Legal TTT \| val_bpb=1.07139	MarioPaerle	#1667
	1.0717	Record: 10L + 7-gram eval cache (mean val_bpb=1.0717)	hypery11	#724
	1.0717	Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171	wfproc	#2072
★	1.0719	Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean)	dexhunter	#1626
	1.0720	Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023)	BharathSShankar	#2022
	1.0721	Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)	renqianluo	#1767
	1.0722	Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean)	yahya010	#1727
	1.0722	Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix	amrayach	#1740
	1.0722	Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix	amrayach	#1741
	1.0722	Add SP8192 Multi-Phase Global SGD + Phased TTT (1.07219 bpb)	jorge-asenjo	#1700
	1.0722	Record: 1.0722 BPB — Improved TTT + HedgeMixer with Per-Layer LR Groups	dexhunter	#953
	1.0723	[Submission] SP8192 FullStack PartialRoPE LeakyReLU - 2026-04-19	sakthivarshans	#1737
	1.0727	Alpha-Scaled LoRA + Warm-start A + WD 1.0 — val_bpb 1.07266 (3-seed mean)	renqianluo	#1765
	1.0729	Record: SP8192 + No Gates + Multi-Phase Global SGD TTT — val_bpb 1.07285 (3-seed mean)	dentity007	#1775
	1.0729	Records: SP8192 + LegalTTT 4ep — 1.0729 (Δ -0.0081 vs 04-09, p<1e-7)	EthanNing	#1812
★	1.0729	Add phased global SGD TTT prefix submission	romeerp	#1610
	1.0730	SP8192 + PolarExpressNS + MIN_LR + LQER Asym Rank-4 \| val_bpb=1.07302 (3-seed mean)	sahiee-dev	#1977
	1.0736	Record: SP1024 + Pre-quant TTT + Parallel Residuals — 1.0736 BPB (beats 1.1147 by 3.66%)	joshkmartinez	#1489
	1.0738	Non-record: final-day SP8192 reproduction and mHC-lite local probe	Kbediako	#1980
	1.0740	Add SP10240 + FreqGPTQ + lowercase tokenization: 1.07399 BPB	nothingLiva	#1707
	1.0741	Record: VarLen Attention + Triton Fused MLP + Doc-TTT + Warmdown 0.75 + Chunk 48 — val_bpb 1.07406 (3-seed mean)	dexhunter	#1560
	1.0742	Add lowercase SP10240 QK 5.125 ablation	suryavanshi	#1814
	1.0742	Recursive Transformer - Non-Record Submission — 1.07424983 val_bpb (4h depth-recurrent hybrid transformer run)	newjordan	#1535
★	1.0744	Record: ImprovedParallelResiduals, 1.0744 BPB / 2.7752 nats, -0.0034 BPB / -0.0088 nats vs PR #1523	msisovic	#1529
	1.0744	[Non-record] Experimentation Summary: Autopsy of 100+ Experiments — What Worked, What Didn’t, Mind Map for LLM Agents, etc.	SPThole	#1602
	1.0744	Non-record: Causal Bigram Blending — eval-time BPB improvement (1×H20…	MaxIv25	#2088
	1.0745	Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)	RoyiRa	#687
	1.0745	Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)	RoyiRa	#688
	1.0746	Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + CaseOps Tokenizer — val_bpb 1.07462 (3-seed mean)	OE-GOD	#1755
	1.0749	Record: Per-Layer Adaptive GPTQ Clip + int7 Embeddings + MLR 0.026 — val_bpb 1.07493 (3-seed mean)	dexhunter	#1586
	1.0750	Record: SP10240 SimCTG + 3-Layer Recurrence — 1.07502 sliding-window (3-seed)	BharathSShankar	#1971
	1.0750	Non-record: Cross-Base Regularizer Transferability — methodological study (20+ cells, 10 figures)	BharathSShankar	#2011
	1.0752	Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean)	codemath3000	#1584
	1.0756	Non-record: xIELU Piecewise Quadratic Activation + Per-Layer QK Gain Convergence	mikeapedia	#1648
	1.0759	Add SP8192 legal TTT size-cleared seed42 fallback	jaydenpiao	#1931
	1.0759	[Record] Stage 3 + SpinQuant V1 + MP-SGD-TTT — val_bpb 1.0759	X-Abhishek-X	#1695
	1.0764	Record: TMA Megakernel + Improved Parallel Residuals + Tap-In min_match=1 — val_bpb 1.07636 (3-seed mean)	andrewbaggio1	#1555
★	1.0764	Record: Varlen attention + fused MLP + doc-independent TTT (1.07643)	samacqua	#1530
	1.0766	Record: SP4096 + Depth Recurrence + Parallel Residuals + Causal SLOT-16 — val_bpb 1.0766 (3-seed mean)	aryanbhosale	#1333
	1.0769	[Record]: MUDD Connections + SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT— val_bpb 1.0769 (3-seed mean)	hilbertmeng	#1936
	1.0770	Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)	EthanYangTW	#1825
	1.0770	Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)	EthanYangTW	#1826
	1.0771	Non-record: Neural Base Model, No TTT — Parcae + Gates + Layered Windows (val_bpb 1.07706)	mikeapedia	#1728
	1.0771	Record: SP8192 + Polar Express NS + Multi-Phase Global TTT — val_bpb 1.0771 (3-seed mean)	aamodbhatt	#1802
	1.0773	Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + TTT 5ep + N-gram Tilt + Hessian SDClip — val_bpb 1.07730	ndokutovich	#1557
	1.0773	Notable non-record: Control+Tail score-first TTT on accepted SP8192 stack	Gotnhub	#1968
	1.0775	Record: SP8192 + VarLen Attention + Doc-Independent LoRA TTT + Banking + Muon 0.97 — val_bpb 1.07747 (3-seed mean)	dexhunter	#1536
	1.0775	Record: AttnOutGate + SmearGate + Softcap 15 — val_bpb 1.07750 (3-seed mean)	Meirzhan05	#1880
	1.0777	Record: SP8192 + VarLen Attention + LoRA TTT + Fused MLP — val_bpb 1.0777 (3-seed mean)	aryanbhosale	#1540
	1.0778	Non-record: GolfParty — composable scaffolding for every Requests-for-PRs item	EthanYangTW	#1978
	1.0778	Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean)	EthanYangTW	#1523
	1.0778	Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean)	bigbag	#1541
	1.0780	Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07800 (3-seed mean)	dexhunter	#1437
	1.0781	Record: 30ep Cosine TTT on LeakyReLU² stack (3-seed mean val_bpb=1.0781)	andrewbaggio1	#672
	1.0783	Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0783 (3-seed mean)	EthanYangTW	#1561
	1.0785	Record: SP8192 + 3GDN/8FA 4096 + Attention Sink	VFYAS	#1895
	1.0785	Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 (3-seed mean)	Victory963	#1731
	1.0785	Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785	Victory963	#1732
	1.0786	Add draft Arman SP8192 credit-blocked record	thearmankarapetyan	#1889
	1.0787	Record: SP8192 + Pre-Quant TTT (QK 5.25, 8ep, freeze-1) — val_bpb 1.0787 (3-seed mean)	aamodbhatt	#1482
	1.0788	Record-track: Trajectory-State Readout + Muon 0.98 + Legal TTT (1.0788)	aazizyan	#1676
	1.0788	Record: SP8192 + BigramHash d=32 + Path A v3 passthrough quantization — val_bpb 1.07882 (3-seed mean)	himanshudongre	#1716
	1.0788	Non-record: Eval-time lever ablations on SP8192 absolute-RoPE stack (companion to PR #1716)	himanshudongre	#1718
	1.0788	Record: Wider Loop + Per-Pass Embeddings + Tap-In V6 + Legal TTT (1.078825 3-seed mean)	abaybektursun	#1518
	1.0790	Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)	aryanbhosale	#1533
	1.0791	Record: SP8192 + Pre-Quant TTT + QK-Gain 5.0 + Depth Recurrence + MuonEq-R — val_bpb 1.0791 (3-seed mean)	aryanbhosale	#1423
	1.0795	Record: SP8192 + Pre-Quant TTT — val_bpb 1.07948 (3-seed mean)	erichroepke	#1416
	1.0796	Add SP8192 Family3+ QK5.75 TTT16K LR0.0075 (1.0796 BPB)	PrzemyslaV88	#1932
	1.0796	Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + V-Gated — val_bpb 1.0796 (3-seed mean)	liujshi	#1770
	1.0797	Record: SP8192 + Depth Recurrence x2 + GPTQ + Score-First TTT + fused-softcap-ce -- val_bpb 1.07974 (3-seed mean)	anthony-maio	#1572
★	1.0798	Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean)	dexhunter	#1514
	1.0799	Non-record: SP8192 + LoRA on tied embedding (1.07994, 1 seed)	yijieyuan	#1759
	1.0800	Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean)	aamodbhatt	#1408
	1.0800	Record: SP8192 + Gram-NS + Polar Express + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0800 (3-seed mean)	PranavViswanath	#1809
	1.0801	Non-record: Coprime-Stride Loader + Full GPTQ + Score-First TTT (3-seed mean 1.08008 BPB)	Meirzhan05	#1876
	1.0801	Record: SP8192 + Systems Optimization — val_bpb 1.0801 (3-seed mean)	codemath3000	#1583
	1.0801	Record: Triple Loop + Fused Kernels + Parallel Residuals + N-gram Tilt; val_bpb 1.08014 (5-seed mean)	abaybektursun	#1420
	1.0802	Add SP8192 QK-Gain 5.30 Python 3.11 H100 TTT submission	claramuse	#2035
	1.0802	Record: SP8192 + Muon 0.97 + 3-Layer Recurrence + Parallel Residuals + TTT — val_bpb 1.0802 (3-seed mean)	aryanbhosale	#1521
	1.0803	Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + Asynchronous Data Loader - val_bpb 1.0803	nogakeren	#1532
	1.0803	Record: DualClock PPM token-context mixture — 1.0803 BPB seed 42	okezue	#1879
	1.0803	Record: NgramRes + Sliding-Window Attention + Legal Score-First TTT. …	ghrua	#1834
	1.0804	SP8192 CaseOps + WiderGate32 + GPTQ-int6 — val_bpb 1.08037 (3-seed mean)	bsisduck	#1969
	1.0805	Non-Record: Polar Express Muon negative result (1.0805 BPB, +0.0004 vs standard NS5)	dexhunter	#1516
	1.0805	Hardik sota submission	hardik-bhalekar	#1864
	1.0805	Record: SP8192 + Headwise Gated Attention + Legal TTT (1.0805 BPB, 3-seed)	jamesEmerson112	#2005
	1.0806	Scylla (novel tokenizer) + Legal Score-First TTT (val_bpb: 1.08056553)	simon-marcus	#1143
	1.0806	Record: Scylla (novel tokenizer) + Legal Score-First TTT (val_bpb: 1.08056553)	simon-marcus	#1796
	1.0806	Add progressive recurrence SP8192 record submission	wisebreadloaf	#1780
	1.0806	Record: SP8192 QRescue + JEPA-Lite + LQER + Pergroup/lrzip + Legal TTT — val_bpb 1.08064	H1cSuNtDr4C0n3S	#2027
	1.0807	Record: Discriminative TTT — val_bpb 1.0807 (3-seed mean)	resouer	#1351
	1.0808	Record: SP8192 ParResid 3LayerLoop QK5.25 LegalTTT — 1.08083 BPB	anmarhindi	#1776
	1.0809	Add SP8192 qkramp05 + par-residual L6 + legal TTT systems rerun (1.080885 seed 42)	Buld1n	#1688
	1.0809	Add W104 SP8192 LegalTTT record candidate	teslaeco	#1750
	1.0809	Record: QK-Gain 5.5 — val_bpb 1.0809 (3-seed mean)	G3sparky	#1715
	1.0810	Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean)	bigbag	#1492
★	1.0810	Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean)	bigbag	#1493
	1.0810	Submission: SP8192 + DepthRecur + MuonEq-R + SGD-TTT + SDClip GPTQ + Brotli-11	AVINASH0052	#1658
	1.0811	Add SP8192 QK525 TTT006 under-16MB submission	Syed-M-Zeeshan	#1954
	1.0812	Add non-record SP8192 pass-gated recurrence submission	Buld1n	#1697
	1.0813	Non-record: SP8192 D 5-seed base and R-series evidence package	amrayach	#1598
	1.0813	Add near-SOTA SP8192 LegalTTT 3-seed reproduction	teslaeco	#1725
	1.0815	Non-record: SP8192 + QK5.4 + Legal Score-First TTT(4) — val_bpb 1.08149 (seed 1337)	aamodbhatt	#1706
	1.0815	Non-record: Polar Express NS Coefficient Ablation on #1809 (val_bpb 1.08154)	Christopher-Lee-McClendon	#1831
	1.0817	Non-record: Online Best-Agree Infini-gram Phrase Memory (1.08167651)	Abhishek8108	#2010
	1.0818	Add PR1720 no-val-pause negative result	LoBreeze	#1943
	1.0818	Notable: SP8192 + 3-Layer Recurrence + Parallel Residuals - 5-Seed Quantization Reference and SDClip Ablations	kiyoaki	#1720
	1.0819	Record: PROTEUS v1.6 — Scylla + Parallel Residuals + Depth Recurrence + Legal TTT — val_bpb 1.0819 (3-seed mean)	MatoTeziTanka	#1289
	1.0820	Submission: SP8192 + Partial RoPE (16/64) + GPTQ SDClip + SGD TTT — val_bpb 1.0820 (3-seed mean)	swapp1990	#1747
★	1.0822	Record: SP8192 + Parallel Residuals + Score-First TTT — val_bpb 1.0822 (3-seed mean)	aryanbhosale	#1477
	1.0822	SP8192 + Adaptive Hessian-Sensitivity GPTQ Clipping — 1.0822 bpb	chris-colinsky	#1689
	1.0824	SP8192 + Gated Attention + NorMuon + Norm-PCT-Dropout + Legal TTT — val_bpb 1.0824	taka6745	#1520
	1.0827	Record: SP8192 + TTT + Eval-Time Hash Embedding — val_bpb 1.08269 (3-seed mean)	resouer	#1460
★	1.0828	Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 (3-seed mean)	dexhunter	#1413
	1.0829	Notable Non-Record: Switched Deep Supervision (first DS submission)	channyzf6	#1629
	1.0832	Adaptive Test-Time Training (TTT) with continuous LR-scaling.	kunwar-vikrant	#1639
	1.0832	Add Adaptive TTT experiment results	kunwar-vikrant	#1638
	1.0832	Add non-record SP8192 Parcae trajectory readout submission	Buld1n	#1703
★	1.0835	Non-record: Parallel Residuals + Hessian-Aware SDClip (3-seed mean 1.08354 BPB)	Robby955	#1412
	1.0842	[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 \| 15.99MB	aryan-cs	#1476
	1.0845	Non-record: QK4 Legal TTT Reproduction (1.08449 BPB)	N10ELabs	#1730
	1.0846	SP4096 + Depth Recurrence + Parallel Residuals + Legal N-Gram	someone114514	#1534
	1.0846	Record: Causal SLOT + Pre-quant TTT — val_bpb 1.0846 (3-seed mean)	resouer	#1306
	1.0847	[Submission] Multi-Temperature AR GPTQ Calibration on SP8192 Stack — 1.0847 BPB	Jeffrey-Le	#1913
	1.0848	Record: TMA Megakernel + Triple Loop + Parallel Residuals — val_bpb 1.08480	andrewbaggio1	#1450
	1.0849	Add 2026-04-23_SP8192_PerLayerClip_UnfrozenTTT_1.0849	Programmerryoki	#1794
	1.0850	[non record] Investigating the Tied-Embedding Bottleneck: Why Boundary Blocks Underperform and What It Means for 16MB Models	SPThole	#1546
	1.0850	Draft Base PR1493 + pruning + q-symbol entropy coding compute-grant checkpoint	VedantKmr0	#1849
	1.0853	[Non-Record] Extended Compute Scaling Analysis: 1.0853 BPB at 50K steps (11.5 hours) on 4×A100MIG	OnlyJundong	#1005
	1.0854	Lucky V — 1.08540457 val_bpb (seed 444)	newjordan	#1322
	1.0855	Add: 11L Complement Training + TTT + No-JEPA submission (val_bpb 1.0855)	BoxiYu	#1257
	1.0856	Record: Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean)	anthony-maio	#1405
★	1.0856	Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean)	clarkkev	#1394
	1.0857	Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5	ymrohit	#988
	1.0857	SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.5 + SGD-TTT [LoRA-TTT Future Work]	Anakintano	#1714
	1.0857	Create README for VRL Revival and Extended Muon Warmup	umshahid	#2085
	1.0858	Non-record: Extended Compute Scaling Analysis (50K steps, 1.0858 BPB, 3 seeds (each run ~12 hours on 4xA100MIG))	OnlyJundong	#1424
	1.0858	Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.0858	pablinga19	#1660
	1.0862	Record: SP8192 + 3-layer recurrence + hard onset — val_bpb 1.0862 (3-seed mean)	pablinga19	#1662
	1.0862	Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean)	pablinga19	#1663
	1.0865	Record: Loqui Auris — 10L + LoRA TTT (mean val_bpb=1.0865, 2 seeds)	LoquiAuris	#548
	1.0866	[Record] SP8192 + SDClip + 3-Layer Depth Recurrence + EMA 0.9965 — val_bpb 1.0866	X-Abhishek-X	#1471
	1.0872	Non-Record: SP8192 + LeanICQ Compose at Int3 — val_bpb 1.08720 / 15.88 MB	dexhunter	#1515
	1.0876	Record: Scylla + Parallel Residuals + Depth Recurrence + Legal TTT — val_bpb 1.0876 (3-seed mean)	MatoTeziTanka	#1274
	1.0877	ANS Compression and Hyperloop	tranphat180603	#2077
	1.0878	Record: Parcae px43 embed7 clip1300 (val_bpb = 1.0878)	User123331	#1995
	1.0881	Add non-record submission: SP8192 baseline + LZMA code-wrap	upascal	#1754
	1.0884	Add non-record loss-gated TTT 4xH100 run	sanilb19	#2074
	1.0887	Record: 11L Depth Recurrence + Discriminative Pre-Quant TTT (8xH100) — val_bpb 1.0887 (3-seed mean)	aamodbhatt	#1406
	1.0889	[submission] SP8192 + QK5 + Freeze10 Loss-Gated Legal TTT (1.08885521)	MuhammedErinArchitecture	#1744
	1.0889	Non-record: 4-Hour Progressive Depth — val_bpb 1.0889	iverbovoy	#895
	1.0889	[Record] 3-Layer Depth Recurrence + EMA 0.9965 + WD 0.095 — val_bpb 1.0889	X-Abhishek-X	#1445
	1.0890	Negative result: qTTT + AdamW on PR #1493 base (val_bpb 1.08902)	phfarath	#1947
	1.0893	Record: SP8192+DepthRec+Half batch SWA+Polar NS+Phased LoRa TTT - val_bpb 1.089 (best), val_bpb 1.090 (3-seed mean) - PiyushDatta	PiyushDatta	#2106
	1.0896	Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + Legal TTT — val_bpb 1.0896 (3-seed mean)	aryanbhosale	#1326
	1.0896	GatedAttn + ValueResid + XSA6 + HedgeMixer + Legal TTT — val_bpb: 1.08965 (3-seed mean)	sahiee-dev	#824
★	1.0897	Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + QK-Gain 5.0 — val_bpb 1.0897 (3-seed mean)	aryanbhosale	#1334
	1.0898	[codex] Non-record: SP8192 Past-only Delta Geo Ruler - val_bpb 1.08979	Arnie016	#2028
	1.0898	Record: Pre-Quant TTT + ETLB: Eval-Time Logit Bias for Neural Language Model Compression 1.0898 BPB on PR #1285 base	AnubhavBharadwaaj	#1399
	1.0900	Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean)	dexhunter	#1331
	1.0901	Record: L_ParallelResiduals_UNetSkips_DepthRecur_Muon_ 1.0901	Hieuabssy	#1893
	1.0903	Record: Scylla + n-gram + legal TTT — val_bpb 1.0903 (3-seed mean)	Campbellb	#1242
	1.0903	Non-Record: Crawler 3f+2cx2 d=832 + Mixed Int5 GPTQ + TTT — val_bpb 1.0903 (1-hour cluster)	Tonyy1977	#1817
	1.0909	Record: 9L XSA-all + LeakyReLU² + 5-gram eval cache — val_bpb 1.0909 (3-seed mean)	resouer	#740
	1.0909	Non-record: Int5 GPTQ + Wider MLP	sergeevii123	#1646
★	1.0912	Record: MuonEq-R + Depth Recurrence + WD=0.090 + All-Int6 GPTQ — val_bpb 1.0912 (3-seed mean)	dexhunter	#1285
	1.0913	Record: SP4096 + 3-Layer Recurrence + GPTQ Embeddings + SDClip + ETLB — val_bpb 1.0913 (3-seed mean)	bigbag	#1415
	1.0916	Add 11L Shared Sparse Sidecar + EMA + AdamW TTT (1.0916 mean)	ymrohit	#555
	1.0918	Research/Ablation: Recurrence schedule sweep on April 5 SP8192 stack (V1 hard vs V2/V3 ramps)	sachinnchaudhary	#1786
	1.0920	Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) \| ~15.9 MB \| 8×H100 SXM	deanbrr	#659
	1.0920	Non-record: 11L GEPA + 30k Steps + Pure Int6 + Legal TTT (val_bpb=1.0920)	Christopher-Lee-McClendon	#668
	1.0923	Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed)	Omrigotlieb	#1344
	1.0924	Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)	dexhunter	#1279
	1.0924	Record: SP4096 + Linear LR + Depth Recurrence -- val_bpb=1.0924 (3-seed mean)	dttdrv	#1395
	1.0925	Record: Vocab4096 + MLP4.0x + SLOT - val_bpb 1.0925 (3-seed mean)	dentity007	#1291
	1.0925	[Record] 11L Depth Recurrence + EMA Tuning (0.9965) — val_bpb 1.0925	X-Abhishek-X	#1421
	1.0926	Record: SP4096 + Depth Recurrence + MuonEq-R + Full GPTQ — val_bpb 1.0926 (3-seed mean)	aryanbhosale	#1296
	1.0929	feat: Non-record 11L PR940 Stack (no n-gram in use) + 20k Steps + Legal TTT (1.0929 BPB)	Christopher-Lee-McClendon	#1232
	1.0929	Record: MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ — val_bpb 1.0929 (3-seed mean)	dexhunter	#1260
	1.0941	Submission: 1.0941 BPB by David Weyh	Upsalla	#622
	1.0944	Non-record: 11L GEPA + 25k Steps + Pure Int6 + Legal TTT (val_bpb=1.0944) - unlimited compute category	Christopher-Lee-McClendon	#644
	1.0945	N-gram Cache + Entropy-Adaptive Alpha: 1.0945 BPB	danielxmed	#1026
	1.0955	Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)	bigbag	#1338
	1.0955	Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)	bigbag	#1339
	1.0959	Record: SP4096 + Polar Express NS + MuonEq-R + WD=0.090 — 1.0959 BPB (3-seed mean)	Omrigotlieb	#1332
	1.0960	Non-record: 11L gated Krylov + AR GPTQ int6 + lzma, 1.09596 BPB	LauraGomezjurado	#1446
	1.0960	Non-record: Extended Compute Scaling Analysis (20K steps, 1.0960 BPB, 3 seeds (each run ~6 hours on 4xA100MIG))	OnlyJundong	#1407
	1.0962	Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0962 (3-seed mean)	bigbag	#1176
	1.0963	Lucky IV — 1.09626897 val_bpb (seed 444)	newjordan	#1286
	1.0970	Record: Cosine TTT scheduling with per-layer lr — mean val_bpb=1.0970 (3 seeds)	mrdavtan	#481
	1.0970	Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970)	gowtham0992	#738
	1.0970	SP8192_LayerRecur_ParResid_QK525_paramOpt	yufang67	#1570
	1.0976	Add non-record SP8192 Parcae prefix TTT readout-only submission	Buld1n	#1704
★	1.0978	Record: 4096-Vocab + 4.0-MLP-mult + 0.085-WD + Simplifications — val_bpb 1.09785 (3-seed mean)	clarkkev	#1218
	1.0980	Record: 11L Depth Recurrence + BigramHash + EMA 0.9965 — val_bpb 1.0980 (3-seed mean)	AbhayAnandUCSD	#1435
	1.0980	WIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base	zoharb157	#1664
	1.0981	Record: 12L Shared-Specific Attention (d=16) + MLP 4.5x (3-seed mean val_bpb 1.0981)	aruniyer	#1774
	1.0983	Non-record: 11L GEPA + 20k Steps + Pure Int6 + Legal TTT (val_bpb=1.0983): unlimited compute: 4×A100-40GB, ~2.8 hours	Christopher-Lee-McClendon	#628
	1.0988	PR #414 + 30-Epoch Cosine TTT (1.0988 BPB)	xexyz	#691
	1.0992	RunPod SP8192 QK-Gain 5.25 seed 42 reproduction evidence	sricursion	#1832
	1.0996	GDN-Hybrid + Legal Score-First TTT + Full-Hessian GPTQ Int6	gracebml	#1749
	1.0996	[Non-record] SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite + Compact Artifact - Val 1.09960971	ChideraIbe123	#1894
	1.1000	Non-Record: TTT and GPTQ Are Fundamentally Incompatible — Quantized Weight Structure Defeats Test-Time Adaptation	himanshudongre	#1341
	1.1005	Non-record: Mamba2 SSM + Attention Hybrid (SP8192) - val_bpb= 1.1005 prequant, Research preview over 56 SSM runs, limitations and findings	SarooshKhan897	#2066
	1.1009	Non-record: Post-Quantization Damage Gap — 11L GPTQ Int6 + Curriculum + Sliding TTT (3-seed, 8xH100)	taka6745	#1818
	1.1015	Record: SLOT + Split-LR + Full GPTQ + XSA-all — val_bpb 1.1015 (3-seed mean)	dexhunter	#1172
	1.1016	Add non-record SP8192 trajectory readout submission	Buld1n	#1701
	1.1020	Add record: SP4096 + Depth Recurrence + Parallel Residuals + QK-Gain + Brotli (1.1020 BPB)	Its-Just-Crump	#1392
	1.1025	Record: Pre-quant AdamW TTT + QK-Gain 4.0 — val_bpb 1.1025 (3-seed mean)	stukenov	#1364
	1.1026	[Submission] EngramLite + Mousse + Progressive Depth Recurrence + TTT — val_bpb 1.1026 \| 15.95MB \| 8×H100	Mertyandimata	#1440
	1.1027	Record: 11L EMA + AdamW TTT 10ep (mean val_bpb=1.1027)	sjp611	#442
	1.1027	Record: MuonEq-R + Context-Only SLOT + QK_GAIN=5.0 — val_bpb 1.1027 (3-seed mean)	bigbag	#1217
	1.1031	SP8192 + Compression-Aware QAT on PR #1493, 3-seed val_bpb 1.10314	yevh	#1805
	1.1035	Record: Hadamard-Rotated GPTQ + dTTT + Recur2 (1.1035 BPB)	tmancino	#1400
	1.1035	Slot Machine — 1.10350531 val_bpb (seed 444)	newjordan	#1282
	1.1036	val_bpb 1.1036 - 12L sp9000 + depth recurrence + hash-TTT	Idan3011	#1565
	1.1043	Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean)	Omrigotlieb	#1297
	1.1043	Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean)	Omrigotlieb	#1298
	1.1047	Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Auto-QMax GPTQ + TTT — val_bpb 1.1047	Mertyandimata	#1397
	1.1047	Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047	Mertyandimata	#1398
	1.1048	Record: Vocab4096 + MLP4.0x + WD0.085 - val_bpb 1.1048 (3-seed mean)	dentity007	#1287
	1.1055	Non-record: Energy as the missing leaderboard axis - Wh per bpb drop across 5 configs	jayyvk	#1952
	1.1056	Non-record: XSA-11 + Parallel Residual (L7+) + Depth Recurrence — val_bpb 1.1056 (1-seed, 1×H100)	PhamPhuHoa-23	#1467
	1.1057	Midnight 12L — 1.10567949 val_bpb (seed 444)	newjordan	#1458
★	1.1063	Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA	msisovic	#1204
	1.1063	Non-record: TWEO early-cosine outlier regularization on SP1024 baseline	PapaFranku4647	#1635
	1.1064	Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean)	andrewbaggio1	#1209
	1.1064	Non-record: Does SLOT violate causal dependence? (empirical test + question)	andrewbaggio1	#1240
	1.1067	Record: Combined 3-Layer Recurrence + Parallel Residuals + Polar Express + Brotli — val_bpb 1.1067 (3-seed mean)	erichroepke	#1396
	1.1070	Record: XSA + LoRA TTT (val_bpb=1.1070)	Elarwei001	#1254
	1.1071	Newton-Muon × PR #1874's document-packed loader: a controlled negative result	GodlyDonuts	#1907
	1.1074	Add SP4096 qk45 budget reproduction candidate	adiprathapa	#2161
	1.1077	Add non-record submission: 12L 24min Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits	shram86	#1495
	1.1078	Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack	agalimova	#720
	1.1078	Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean)	vlivashkin	#1302
	1.1079	Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079)	Christopher-Lee-McClendon	#612
	1.1084	Record: PR #1105 + window attn + mixed seq_len — 1.1084 bpb (3-seed mean) 1.1084 bpb	Gusanidas	#1219
	1.1085	1.1085 BPB: JEPA + AdamW TTT + Full GPTQ + FA3 + LZMA	NewyorkDev	#1006
	1.1086	Record Submission: 1.1086 BPB - Turbo-Muon + EngramLite + ParamBanking (11L 512d)	mikeapedia	#1089
	1.1088	Non-Record: Unified Attention + FA3 + 1hr training (val_bpb=1.1088)	VirajDeshwal	#1270
	1.1090	[Notable Non-Record Submission] 1.1090 BPB - 74.3M Ternary U-Net Transformer (100k steps/3h)	CiprianFlorin-Ifrim	#923
	1.1091	review: Rerun of PR #1089	AnirudhRahul	#1126
	1.1092	add varlen+fused mlp+ttt record	samacqua	#1354
	1.1092	Add non-record SP8192 tempered BPB-weighted loss submission	Buld1n	#1702
	1.1092	Non-record: MoE Upcycling + Depth Recurrence — Quantization Gap Analysis	MaxIv25	#2102
	1.1093	Record: 15L Depth Recurrence + LeakyReLU² + Cosine TTT (3-seed mean val_bpb=1.1093)	aruniyer	#857
	1.1093	Non-record: k-XSA (k=2) rank-2 subspace XSA with energy cap	ShadowNinja10	#2084
	1.1093	auto golf	yigengjiang	#1985
	1.1098	review: Rerun of PR #1120 (Rascal) on 8xH100 SXM	dexhunter	#1177
★	1.1099	val_bpb 1.1099 (3-seed mean) Rascal	newjordan	#1120
	1.1100	Record: Loqui Auris — 10L + SWA + Standard TTT (val_bpb=1.1100)	LoquiAuris	#595
	1.1100	Non-record: Comprehensive Negative Results — What Doesn't Work on Strong Models	andrewbaggio1	#1272
	1.1100	Record: MuonEq-R + Context-Only SLOT + XSA-all + QK-Gain 5.0	BiggerDABOSS	#1276
	1.1100	Submission/epsilon flashonly 2026 04 06	teerthsharma	#1401
	1.1101	Add 07c1 strict RunPod base submission	amrayach	#1307
	1.1101	Record: 11L TrigramHash + ValueResidual + GradQuant + Cosine TTT (mean val_bpb=1.0887, best 1.0879)	ndokutovich	#486
	1.1104	Record: Depth Recurrence + MuonEq-R + AR Self-Gen GPTQ — val_bpb 1.1104 (3-seed mean)	aryanbhosale	#1290
	1.1104	[Non-record] E2E TTT at 27M scale — negative result (val_bpb 1.1104, SP1024)	ChideraIbe123	#1625
	1.1104	Non-record: 11L s2048 4h on 1xA100 — 1.1104 BPB	xiehuanyi	#1528
	1.1105	Record: 11L Int5 + 6-Expert HedgeMixer + LeakyReLU(0.9)^2 + TTT (val_bpb=1.1105)	dttdrv	#849
	1.1105	Record: Split-LR + BigramHash(2816x160) + Full GPTQ + Brotli — val_bpb 1.1105 (3-seed mean)	dexhunter	#1179
	1.1108	Record: Window Attention + Mixed Seq_Len Training, bpb 1.1108, eval at 6144 (5-seed mean)	Gusanidas	#1212
	1.1109	Record: 1.1109 BPB Loader FullGPTQ XSA11 + online ngram augment	AnirudhRahul	#1145
	1.1111	val-only 10min record (val_bpb:1.1111)	daniellawson9999	#44
	1.1116	Record: Fused Triton MLP + Full GPTQ + Coprime Loader + XSA-all + BH2816 (val_bpb 1.1116)	barneywohl	#1135
	1.1117	Record: Bank QAT + seq4096 + SWA w=256 + QK-Gain 2.5 + PKO — val_bpb 1.1117 (3-seed mean)	Itssshikhar	#1512
★	1.1123	Record: 1.1123 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)	dexhunter	#1060
	1.1124	Record: Aggressive SGD TTT (3-seed mean val_bpb=1.1124)	fielding	#757
	1.1126	Turbo-Muon + EngramLite + ParamBanking + GPTQ Reserve Opt — val_bpb 1.1126 (3-seed mean)	Bortlesboat	#1169
	1.1129	Add non-record AR GPTQ XSA ROTQ Hadamard submission	vermissa0ss	#1224
	1.1130	Sota 11 l submission	malc3om	#1077
	1.1131	Add in-progress non-record submission for Legal TTT + Muon+ + QK Gain 4.0	scottcui-georgian	#1645
	1.1133	Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)	Bortlesboat	#1099
	1.1135	Record: SP4096 + Compressibility Regularization — val_bpb 1.11349 (6-seed mean)	jpfeiffe	#1508
	1.1135	14L QEP GPTQ + Per-Window SGD TTT (1.1135 BPB, 3-seed)	himanalot	#1655
	1.1136	Non-record: 11L XSA-All + EMA + Legal GPTQ on 8xH100 (1.11355 BPB)	Rtx09x	#1694
	1.1140	Record: 1.1140 BPB — ResidLambdas + Split-LR + Train-Budget GPTQ + Coprime Loader (12-seed mean)	Gusanidas	#1130
	1.1142	Record: Val-Calibrated GPTQ + XSA-all + BigramHash 3072×112	abaybektursun	#728
	1.1142	Non-record: Negative results — quantization algorithms & TTT on val-GPTQ stack	abaybektursun	#756
	1.1143	Non-record: Exact Sequence Matching on PR #1019 (1.1143 BPB)	cadenmcmann	#1309
	1.1144	Ultimate: GatedAttn + ValueResidual + Full QAT + lzma-9 + BigramHash(2048)	FlashyFlash3011	#952
	1.1145	Record: 33.6M Int5 GPTQ + Score-First TTT (val_bpb=1.1145, 3-seed)	ibarrajo	#991
	1.1145	1.1145 BPB: Parallel Muon + INT5 GPTQ + Legal TTT	EthanYangTW	#1171
	1.1146	Record: EngramLite + Gated Skips + Full GPTQ + FA3 — val_bpb 1.1146 (1-seed, 2 pending)	icryo	#1122
	1.1146	BPB-weighted training loss: align training objective with eval metric	elliottdehn	#1519
	1.1147	[Non Record] Learn to Learn: Meta-Learning-TTT Redesign — Cross-Chunk FOMAML + Delta-Loss + MetaSGD	SPThole	#1502
	1.1147	Memmap multi-shard data pipeline + GPU prefetch + LeakyReLU² + Legal TTT + Parallel Muon	DeepReinforce	#726
★	1.1147	Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean)	abaybektursun	#1019
	1.1147	Non-record: Negative results — eval-time interventions, mixed-precision GPTQ, loss truncation	abaybektursun	#1103
	1.1147	WIP: Depth Recurrence via Weight-Shared Transformer Blocks	GitGeeks	#1278
	1.1151	Legal TTT (SGD, 3-epoch) + SLOT (lr=0.003, steps=5) on PR #549 base -- val_bpb: 1.11512 (3-seed mean, beats merged SOTA 1.1194)	sahiee-dev	#1150
	1.1154	Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed)	saml212	#609
	1.1154	Record: SLOT + LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1154 (3-seed mean) val_bpb = 1.1154 (3-seed mean, std 0.0002) \| ~15.9 MB \| 8×H100 SXM	AnubhavBharadwaaj	#1128
	1.1155	Record: Flower Brain 6-Cell Ternary Architecture — val_bpb 1.1155 (unlimited compute)	G3sparky	#1896
	1.1156	Record: AR Self-Gen GPTQ + XSA-11 + BigramHash3072x112 (mean 1.1156)	aamodbhatt	#1280
	1.1156	Non-record: 11L FullGPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11564 (1-seed)	AVINASH0052	#1473
	1.1156	Submission/sp8192 depthrecur adamwttt	AVINASH0052	#1619
	1.1157	Add 11L XSA11 + BigramHash3072 + AdamW Legal TTT submission	someone114514	#841
	1.1158	Full GPTQ + XSA-all + SWA/EMA (val_bpb=1.1158, 3-seed mean=1.1163)	Robby955	#639
	1.1158	Record: 11L LatentMask TTT + GPTQ + Product-Key Bigram + Brotli — val_bpb 1.1158 (3-seed mean)	izlley	#1410
	1.1159	[Non Record] Learn to Learn: Position-Conditional Bigram Hashing + Meta-Learning + TTT Ablation	SPThole	#1501
	1.1160	Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160)	hypery11	#525
	1.1160	Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160)	hypery11	#557
	1.1161	Record: EGGROLL v2 — val_bpb 1.1161 (3-seed mean, std 0.0001)	haikosys	#1156
	1.1162	Record: int5 GPTQ + Soft-Round QAT (3-seed mean 1.1162)	EthanYangTW	#606
	1.1163	Record: Full GPTQ + LeakyReLU² + Parallel Muon + BigramHash 3072 (val_bpb 1.1163, 3-seed mean)	abaybektursun	#593
	1.1163	Stable Growing Recurrence: Progressive Depth + Error Feedback (non-record)	nestamidavaine	#1230
	1.1163	Non-record: Stable Growing Recurrence, Progressive Depth + Error Feedback	nestamidavaine	#1231
	1.1164	Record: Train Larger, Quantize Harder - 33.6M params + int5 GPTQ / (val_bpb: 1.1164)	cmcdnd	#576
	1.1164	Record: 11L XSA-all + LeakyReLU(0.5)² + VR + GA (val_bpb=1.1164, pending 3-seed)	Asukabot0	#638
	1.1169	Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x	danialht	#615
	1.1169	Non-record: ETD Hybrid (3 enc + 3×3 think + 4 dec) + Int5 GPTQ + MuonEqR + U-Net — 1.1169 BPB	5en5e1	#1828
	1.1170	Record: Fused LeakyReLU² + Online GPTQ + Parallel Muon — val_bpb 1.117 (1-seed)	vimeto	#1072
	1.1171	Non-record: PR703 + shard-order curriculum + GPTQ cache-backout (1.1171)	petergpt	#783
	1.1171	Record: 11L XSA-all + Full GPTQ + Parallel Muon + Selective Pruning (val_bpb: 1.1171)	raahilshah	#634
	1.1171	Non-record: Negative results — hardware alignment & quantization on 8xH100	abaybektursun	#670
	1.1172	Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x	danialht	#790
	1.1172	Non-record: Depth Recurrence + GPTQ + SGD TTT (1.1172, 1xH100)	swapp1990	#1422
	1.1174	Record: CROWN-Q + GPTQ + Legal TTT — val_bpb 1.1174 (3-seed mean)	EthanYangTW	#1129
	1.1175	Non-record: Cosine TTT 30ep on SwiGLU + U-Net (1xH100, val_bpb=1.1175)	andrewbaggio1	#509
	1.1175	Record: 11L VRL + LeakyReLU² + Full GPTQ (3-seed mean val_bpb=1.1175)	gowtham0992	#569
	1.1175	Non-record: 30ep Cosine TTT on SwiGLU + U-Net (1xH100, val_bpb=1.1175)	andrewbaggio1	#661
	1.1176	Record: PR549 + MiLe decay + 8-bit Muon + 1.04x LR + Cache+Backout — val_bpb 1.1176	Gusanidas	#703
	1.1177	Non-record: Exact Sequence Matching + TTT on PR #549 (1.1177 BPB)	cadenmcmann	#1310
	1.1178	Record: Late Soft-Round QAT + Score-First Backward-Looking TTT — val_bpb 1.1178	RoyiRa	#589
	1.1179	int5 GPTQ + 33.6M model: 1.1179 BPB (3-seed mean)	EthanYangTW	#544
	1.1179	Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)	EthanYangTW	#545
	1.1179	Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)	EthanYangTW	#585
	1.1179	Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean)	aamodbhatt	#999
	1.1179	Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean)	TimPietruskyRunPod	#1037
	1.1179	Record: 11L Muon Legal TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean)	aamodbhatt	#1148
	1.1179	Non-record: SLOT eval-time delta optimization + QK-Gain (val_bpb=1.1179)	ibarrajo	#1236
	1.1180	[10min/16mb] David Ghazaryan — MoE + BigramHash4096 (mean BPB: 1.11799)	davie2009kh	#1451
	1.1180	records: David Ghazaryan — MoE + BigramHash4096 (val_bpb 1.11799)	davie2009kh	#1538
	1.1180	Record: 10L + Batched LoRA TTT (mean val_bpb=1.1180, 3 seeds)	hypery11	#713
	1.1180	Record: Full GPTQ + LeakyReLU² + Parallel Muon (3-seed mean 1.1180)	kshitizz36	#626
	1.1181	Record: SwiGLU+VE128+NoTTT val_bpb=1.1181 (3-seed mean)	JoeProAI	#505
	1.1182	Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182	msisovic	#686
	1.1182	Record: Depth Recurrence + SGD TTT : 1.1182 BPB	Naazimsnh02	#752
	1.1182	Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)	ibarrajo	#1004
	1.1184	Add LeakyReLU² + 4ep Legal TTT submission	yufengli-oai	#1039
	1.1184	Architectural Record: 1.11837 BPB via KGIIR Trajectory Mixing	Adam-Jacuch	#965
	1.1185	Non-record: Empirical Bayes Adaptive TTT (val_bpb=1.1185)	Robby955	#484
	1.1185	LeakyReLU(0.75)² + Legal TTT + Parallel Muon — 1.1185 BPB (3-seed mean)	michaelwinczuk	#977
	1.1185	Record: MTP-2 Funnel + LeakyReLU(0.75)² + Legal TTT + Parallel Muon	michaelwinczuk	#1031
	1.1185	Non-Record: SLOT Eval-Time Augmentation on PR #549 SOTA Stack val_bpb = 1.1185 (3-seed mean, std 0.0003) \| ~15.9 MB \| 8×H100 SXM	AnubhavBharadwaaj	#1084
	1.1186	Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean)	EthanYangTW	#690
	1.1186	Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean)	EthanYangTW	#692
	1.1186	Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean)	EthanYangTW	#693
	1.1187	Add run_17 8xH100 submission (1.118685, <16MB)	adityakm24	#1098
	1.1187	Add run_17 8xH100 submission (1.118685, <16MB)	adityakm24	#1117
	1.1187	Submission: 11L XSA4 + TrigramHash + ValueResidual + Legal TTT (val_bpb=1.1187)	adityakm24	#1118
	1.1187	Add 11L RotaryFix + LegalTTT + BIGRAM3072 — val_bpb 1.11869 (3-seed m…	Upsalla	#714
	1.1187	-0.0041 BPB by Reordering Training Data (Curriculum Learning)	abaybektursun	#650
	1.1188	Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)	ibarrajo	#1001
	1.1189	Hymba-11L: SOTA High-Density Takeover (1.1189 BPB)	Prush69	#852
	1.1190	[non-record] Sharpness-Aware Minimization (SAM) Inner Loop for Meta-TTT	SPThole	#1601
	1.1190	GPTQ Int6 + SGD Test-Time Training — A800 1.1190 bpb	ChaosCodes	#610
	1.1190	Three Breadsticks: 1.1190 BPB	newjordan	#656
	1.1190	Non-record: 1.1190 BPB — Independent PR #549 Reproduction (10min 8×H100)	manfromnowhere143	#1069
	1.1190	Non-record: Aweb Ultimate — 1.1190 BPB (10min 8×H100, independent PR #549 reproduction)	manfromnowhere143	#1070
	1.1194	Add Stack Integration + Legal TTT submission package	Taleef7	#1050
	1.1194	Add_Maestro_Solar_Protocol_Joeavaib	Joeavaib	#625
★	1.1194	Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean)	abaybektursun	#549
	1.1194	feat: depth recurrence + cosine recovery TTT	Danishlynx	#697
	1.1194	Record submission: Poly5 Softcap + BigramHash(3072) + Wider GPTQ-lite…	jimliu741523	#816
	1.1194	Record: 1.1194 BPB — v9 Batched Muon + Full GPTQ Random Calib + JEPA Research	NewyorkDev	#1124
	1.1194	[Submission] Jtss-ux - 1.1301 BPB (10min_16mb)	Jtss-ux	#1269
	1.1195	Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195)	EthanYangTW	#503
	1.1195	Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195)	EthanYangTW	#528
	1.1195	Record: GPTQ + Legal TTT (3-seed mean val_bpb=1.1195)	EthanYangTW	#529
	1.1196	Non-record: Negative results from gated multi-order hash n-grams	xiayicheng3-code	#1369
	1.1196	Non-record: TTT chunk ordering does not improve BPB — negative results from 7 ordering variants	jpfeiffe	#1320
	1.1198	Non-record: Full GPTQ + XSA-4 + Score-First TTT (3-seed mean 1.1198)	Robby955	#734
	1.1198	Non-record: Fused Triton relu^2 kernel — negative result (val_bpb=1.1198)	ibarrajo	#1237
	1.1199	Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (single seed)	Christopher-Lee-McClendon	#1170
	1.1200	Non-record- QAT cooldown + INT4 MLP + NuMuon-lite - 1.12 BPB	marinabar	#1788
	1.1201	Non-record: 1.1201 BPB - Shared ValueEmbedding (tok_emb reuse, layers 5-10) + Legal TTT	mradassaad	#768
	1.1204	Record: 11L LeakyReLU² + Full GPTQ + QAT Alignment (val_bpb: 1.1204)	raahilshah	#535
	1.1207	GPTQ + Short TTT — val_bpb 1.1207 (seed 1337)	newjordan	#533
	1.1207	GPTQ + Short TTT — val_bpb 1.1207 (seed 1337)	newjordan	#577
	1.1207	Vocab 8192 Entropy Optimized: 1.1207 BPB (3-seed mean)	tyrel-beede	#1284
	1.1208	XSA-11 + GPTQ b64/pd002 — 3-seed mean val_bpb 1.1208	newjordan	#587
	1.1208	Add non-record streaming legal TTT late-block submission	simon-marcus	#662
	1.1211	Non-record: XSA-all + mHC + Full QAT (val_bpb=1.1211)	autocode-rayes	#928
	1.1213	Non-record: 11L EMA + TTT(20ep,freeze=0) + 15-run ablation study — val_bpb=1.1213 (3-seed)	felipe-parodi	#398
	1.1214	Record: Legal Score-First TTT + Parallel Muon — val_bpb 1.1214 (3-seed mean)	abaybektursun	#473
	1.1215	GPTQ + Early QAT + Legal TTT — 3-seed mean val_bpb 1.1215	newjordan	#508
	1.1215	GPTQ + Early QAT + Legal TTT — 3-seed mean val_bpb 1.1215	newjordan	#578
	1.1215	Non-Record: 11L Parallel Muon + LN Scale + LeakyReLU² MLP3x + Legal TTT — val_bpb 1.1215 (3-seed mean)	aryanbhosale	#838
	1.1216	Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (val_bpb=1.1216)	EthanYangTW	#410
	1.1216	Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (val_bpb=1.1216)	EthanYangTW	#415
	1.1216	Non-record: QNA + SQWA compression thesis (8xH100 SXM)	Abhishek8108	#975
	1.1217	Record: Adaptive Precision Embedding Quantization (4-seed mean val_bpb=1.1217)	nothingLiva	#1042
	1.1219	Full-Training QAT: 1.1219 bpb	autocode-rayes	#836
	1.1219	XSA-All 11L + LeakyReLU(0.75)² + Aggressive Legal TTT → 1.1219 BPB	teddyoweh	#1092
	1.1220	Non-record: Swarm-Guided KG-Conditioned Training (val_bpb=1.1220)	michaelwinczuk	#1081
	1.1220	1.1220 bpb: GPTQ + EMA + XSA-all + BigramHash3072 (11L 512dim)	jorge-asenjo	#1361
	1.1220	Record: XSA-all + GPTQ + FA3 dtype fix (val_bpb: 1.1220)	G3sparky	#1494
	1.1224	[Record] Block Attention Residuals + Tuned Legal TTT — val_bpb 1.12242 (8xH100 primary)	kings-crown	#1696
	1.1227	Record: 11L XSA4 + Tight SWA + FA3 + Two-Phase TTT (3-seed mean val_bpb=1.1227)	EthanYangTW	#417
	1.1227	[track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227	adityakm24	#1182
	1.1228	Add non-record 16MB submission: quant-quality-first 1.12276 BPB	ciach	#1080
	1.1228	Add 11L TTT LoRA submission: SOTA architecture + per-document LoRA te…	ryanadamsai	#617
	1.1228	Ssr submission	joshuapetersen	#1993
	1.1229	Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1229 (3-seed mean)	anthony-maio	#175
	1.1230	Add non-record 11L XSA4 EMA run (val_bpb 1.12296, over 16MB)	kshitizz36	#416
	1.1230	JEPArdy! Non-Record Submission - JEPA + Leader-Stack - val_bpb 1.1230	simon-marcus	#1243
	1.1231	Record: 11L + Tight SWA + VE128 + Partial RoPE + LN Scale + TTT (val_bpb: 1.1231)	ElliotSlusky	#388
	1.1231	Frequency-Weighted Embedding Quantization (1.1231 BPB)	pattern4bots	#898
	1.1231	Non-record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 control (val_bpb=1.1231, 8xH100 verified)	AbhisekBasu1	#429
	1.1233	Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233)	signalrush	#414
	1.1233	5 novel architecture ablations on SOTA baseline	ssatia	#584
	1.1233	[WIP] Non-record: Local Ablation Pipeline — EMA + Int6 + Partial RoPE (GTX 1650)	gthgomez	#682
	1.1234	Add non-record 10min submission: 11L XSA4 + EMA + GPTQ + FA3 (1.12336724)	NewyorkDev	#636
	1.1234	Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1234	anthony-maio	#657
	1.1236	Late Training Replay + EMA + GPTQ-lite (val_bpb=1.1236, 2-seed, no TTT on eval)	newjordan	#445
	1.1239	Notable Non-Record Submission: 1.1239 BPB - 106.2M Binary Asymmetric U-Net + NeoMuon + 4xrelu²MLP + Smear + Fact Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8 + Bit-packing LZMA + Stride-16 Eval - 2h	CiprianFlorin-Ifrim	#641
	1.1240	Submission: 11L EMA + GPTQ-lite + Int6 (val_bpb: 1.1240)	Dhruba531	#710
	1.1240	Non-record: GQA + LZMA + SLOT eval optimization (val_bpb=1.1240)	ibarrajo	#1249
	1.1243	Record: 11L + EMA + Tight SWA + QAT0.15 + VE128 + Partial RoPE + LN Scale (val_bpb: 1.1243)	newjordan	#401
★	1.1246	Record: 11L + Tight SWA + Shared VE128 + Partial RoPE + LN Scale + XSA4 (val_bpb: 1.1246)	unnir	#374
	1.1247	Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)	greqone	#394
	1.1247	Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean)	abaybektursun	#399
	1.1247	REHA-DEQ-WSE: Deep Equilibrium with Weight Synthesis for Parameter-Efficient Language Modeling	sohv	#1323
	1.1247	Publish clean prune baseline 1.12470947 as non-record package	resouer	#1058
	1.1248	qat + ttt + value embeddings	bopmite	#218
	1.1248	Record: 11L Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1248)	jfprincz	#315
	1.1248	Exploratory: PR315-derived candidate and looped-depth gate	Divyesh-Thirukonda	#453
	1.1249	[4090 Reproduction] Achieve 1.1249 val_bpb (Note: 18KB over limit)	samchill666	#1717
	1.1250	Record: DominationV3 + GPTQ-lite + TTT25 (mean val_bpb=1.1250, 3 seeds)	yesbhautik	#64
	1.1253	Non-Record: 11L Parallel Muon + LeakyReLU² MLP3x + Legal TTT (val_bpb 1.1253)	aryanbhosale	#754
	1.1254	Record: 11L XSA+EMA+TTT, sliding val_bpb=1.1254 (3-seed mean 1.1256)	alertcat	#338
	1.1257	Non-record: Negative results & insights from 24hrs on 8xH100	charmquark1984	#375
	1.1257	Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)	dannywillowliu-uchi	#379
	1.1259	Add competitive 8xH100 run package (1.1259 bpb)	adityakm24	#1066
	1.1261	PP12: Bayesian posterior packets + selective gating (1.1261 BPB)	okezue	#1043
	1.1264	Non-record: GQA + LZMA + Selective Pruning (val_bpb=1.1264)	ibarrajo	#1248
	1.1266	sp4096 + 10L 3.5x MLP + GPTQ + TTT (1.1266 BPB)	Idan3011	#1431
	1.1268	New SOTA: 1.12676 BPB - 11L XSA-all(11) + GPTQ-lite + EMA + Late QAT	gowtham0992	#478
	1.1269	11L VRL + Parallel Muon + Legal TTT v2 (val_bpb=1.1269, non-record)	ADIITJ	#1016
	1.1269	Non-record: SP8192 + RandProj384 tied embeddings + Pairwise-QK Muon -- Single-seed negative result	YaseenHQ	#2149
	1.1270	Record: 11L Tight SWA + Partial RoPE + LN Scale + XSA4 (val_bpb: 1.1270)	sadeghja1070	#564
★	1.1271	Record: 11L XSA + EMA + Int6 MLP3x + WD=0.04 (val_bpb: 1.1271)	jfprincz	#287
	1.1271	Non-record: Custom serialization replacing torch.save + zstd-22	joyceyan	#1649
	1.1276	Record: BESE Tokenizer 287 vocab — 1.1276 BPB	mrbese	#1327
	1.1280	10-min record: 13L int4 MLP + qTTT + QAT Precompile + ANS Hybrid (val…	yunoshev	#1683
	1.1284	Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal)	sseanliu	#318
	1.1284	Research: Why Novel Architectures Fail at 16MB — Throughput-Quantization Co-optimization	sseanliu	#831
	1.1287	Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287)	dentity007	#406
	1.1289	Non-record: Scylla Tokenizer Byte Accounting Audit — Sub-1.0 Was a Measurement Error	andrewbaggio1	#1271
	1.1290	Add non-record pre-TTT anchor submission	amrayach	#1101
	1.1295	Record: Sponge Bath — TTT 8ep eval-only improvement (val_bpb: 1.1295)	newjordan	#390
	1.1296	Record: 11L CANON-AC(last5)+DeltaGate Report (Humble Record Attempt, val_bpb: 1.1296)	chanwoo-park-official	#400
	1.1299	Record: 11L Tight SWA + VE128 + XSA4 + TTT (3-seed mean val_bpb=1.1299)	kasimte	#455
	1.1303	Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303	timowhite88	#254
	1.1303	Non-record: 11L LeakyReLU² + EMA + LZMA Int6 (val_bpb: 1.1303, 2-seed mean)	htrung1105	#1311
	1.1307	Record: 11L + Efficient Partial XSA (val_bpb: 1.1307)	unnir	#265
	1.1307	Non-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes	Jaksenc	#1475
	1.1309	Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309)	parinzee	#493
	1.1311	Record: 11L XSA4 + EMA + LoRA TTT + Partial RoPE + dim480 — val_bpb 1.13112 (3-seed)	dentity007	#1127
★	1.1318	11-Layer Int6 + WD=0.04 + SWA + FA3 (val_bpb: 1.1318)	jfprincz	#198
	1.1320	Record: 12L Gradient-Guided Quant + Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1320)	saml212	#332
	1.1320	Record: 11L Full Stack + XSA4 + Tight SWA + Late QAT (val_bpb=1.1320)	joelnishanth	#383
	1.1324	Non-record: Depth Recurrence + Int7 Mixed Quant — val_bpb 1.1324 (3-seed mean)	iverbovoy	#1453
	1.1324	Record: 11L + XSA4 + EMA + Late QAT + GPTQ-lite (1.1325 BPB)	pragnyanramtha	#531
	1.1326	Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)	JoeProAI	#861
	1.1326	Draft: SOTA+ TTT + RoPE50K + EMA + Curriculum (pending H100 run)	0xjaishy	#223
	1.1327	Record: 7L MLP3x + BigramHash + SmearGate + TTT 5ep (mean val_bpb=1.1327)	sofiabod	#489
	1.1328	Submission: 11L NTK-RoPE + FA3 + Batch524K + XSA4 + EMA (val_bpb=1.1328)	signalrush	#369
	1.1329	Non-record: Negative findings on codebook quantization, magnitude pruning, multi-token prediction, embedding factorization	mrdavtan	#212
	1.1330	Non-record: 11L MLP3.5x LeakyReLU(0.5)^2 + Full SOTA Stack (mean val_bpb=1.1330, 8xH100)	aryanbhosale	#344
	1.1330	Non-record: 11L MLP3.5x LeakyReLU(0.5)^2 + Full SOTA Stack (mean val_bpb=1.1330, 8xH100 SXM)	aryanbhosale	#635
	1.1334	Non-Record: BPB 1.1334 — 7000-Step Training + Mixed Int6/Int8 Quantization + Legal TTT	Christopher-Lee-McClendon	#598
	1.1335	Non-record: Verifily Three-Tier Token Weighting + DCLS Salience (SP1024, 1.1335 BPB)	arsenis-cmd	#1634
	1.1336	Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1336 (15.59 MiB)	JoeProAI	#1040
	1.1345	11L MLP3x + Int6 QAT + XSA + EMA + BigramHash + FA3 (val_bpb 1.1345)	tmustier	#359
	1.1346	Track 10min_16mb: PR #287 family rerun at 585s wallclock (mean val_bpb=1.1346)	tmustier	#483
	1.1347	Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request)	Christopher-Lee-McClendon	#1166
	1.1349	SOTA Submission (1.1349 BPB) by weywey [10min_16mb track]	Upsalla	#646
	1.1349	Track A: 11L U-Net + BigramHash + SmearGate + Partial RoPE + QAT (1.1349 bpb)	Omrigotlieb	#1086
	1.1349	Non-record: Online Hessian GPTQ (val_bpb=1.1349)	ibarrajo	#1251
	1.1352	Non-record: Fixed Bank QAT + XSA5 + Label Smoothing (1.1352)	suchitj2702	#667
	1.1354	Record: 11L EMA + BigramHash(12288) + Mixed Int5 + FA3 (1.1354)	simonbissonnette	#466
	1.1354	Record: 11L + Partial XSA + TTT + BatchOpt (val_bpb=1.1354)	ibarrajo	#290
	1.1354	Non-record: 1.1354 BPB — 10L TTT 22ep AdamW Cosine + LeakyReLU(0.5)² + TrigramHash	bigbag	#562
	1.1355	The Frugendorff: Recursive Weight Sharing for Transformer Compression (1.1478 BPB, 15.19MB)	newjordan	#579
	1.1356	Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1356 (15.60 MiB)	JoeProAI	#1041
	1.1357	Record: 11L XSA4 + EMA + Batch524K + zstd fallback (val_bpb: 1.1357)	dennisimoo	#307
	1.1360	Record: 11L XSA6 + Warmdown3000 + QAT@0.30 (val_bpb=1.1352, 2-seed mean)	0xNoramiya	#695
	1.1361	11L + XSA4 + EMA(0.997) + seq2048 + Int5-MLP + MuonWD=0.04 + LateK-FP16 \| val_bpb=1.1361	HyperPotatoNeo	#372
	1.1364	Record: 11L Backout + Int6 + SWA (val_bpb: 1.1364)	sheeki03	#339
	1.1364	Record: Dynamic Eval + TTT on SOTA Pipeline (val_bpb=1.1364)	translatingthename	#397
	1.1364	Non-Record: Ouroboros — Crawler Architecture Research (1.1364 BPB)	newjordan	#1308
	1.1365	10L XSA + EMA + Partial RoPE + LN Scale (val_bpb: 1.1365)	ofirkris	#458
	1.1365	11L + Hadamard Rotation + VE128 + cuDNN SDPA (val_bpb: 1.1365, 3-seed mean)	EaCognitive	#586
	1.1366	10L XSA + EMA + Partial RoPE + LN Scale (val_bpb: 1.1366)	ofirkris	#452
	1.1370	10L XSA + LeakyReLU² + Partial RoPE (val_bpb=1.1370)	parinzee	#434
	1.1371	Non-record: EMA+SWA Tight Averaging with Fused TTT LoRA + Sliding Window (1.1371 BPB)	yunoshev	#1366
	1.1372	Crawler Transformer 3f+2cx2 + SP8192 + SDClip + Post-Quant TTT — val_bpb 1.1372	Tonyy1977	#1579
	1.1373	Ouroboros — 1.13727008 val_bpb (seed 444)	newjordan	#1283
	1.1374	Record: val_bpb: 1.14020 [tested 3x on 8xh100]	andrewgcodes	#267
	1.1381	Non-record: val_bpb=1.1374, FA2+SWA adaptation of Farnsworth	charmquark1984	#281
	1.1381	Non-record: v6.2 Phase 5a SOTA-trivial stack (3-seed re-run @66% = 1.138112; TTT 1.204 not competitive)	sisegod	#1465
	1.1382	Submission: DominationV2 + BOS-Reset Bigram Cache + TTT (val_bpb=1.1382, 3-seed mean)	shouryamaanjain	#958
	1.1384	Non-record: Quinary quantization + SP16384 + per-group lrzip + TTT - bpb 1.1384	deniskurlov	#2086
	1.1387	Record: 1.1387 BPB — 11L LeakyReLU² + Early QAT@0.5 + GPTQ-lite + EMA	0xadvait	#979
	1.1387	Non-Record: BPB 1.13872 — LeakyReLU(0.5)² + Per-Layer LR Legal TTT (3 seeds)	Christopher-Lee-McClendon	#537
	1.1388	Submit Int6 QAT parameter-golf entry	malc3om	#403
	1.1396	Record: 11L Gradient-Guided Adaptive Quant + EMA + Sliding Eval (val_bpb=1.1396)	albertorkive	#422
	1.1399	Record: 11L XSA + EMA + Int5-MLP (val_bpb=1.1399)	Mapika	#349
	1.1399	Record: 11L Next-Gen Stack + Custom Kernels, val_bpb=1.1399	anthony-maio	#376
	1.1400	Submission: RationalRaven — sliding64 mean 1.139957 (3-seed)	twistloop	#2055
	1.1400	Record: 11L Int6 + SmearGate + Batch Optimization (val_bpb=1.1400)	saml212	#236
	1.1400	Feature/sota optimizations	adityagupta26	#358
	1.1400	feat: Ultimate SOTA submission - 10L Model, Mixed Int6 QAT, and TTT/LoRA Evaluation	adityagupta26	#361
	1.1401	Record: 11L XSA + EMA + TTT + Partial RoPE + LN Scale — val_bpb=1.1401	mrdavtan	#371
	1.1402	QAT x SWA Ablation: SWA sabotages QAT (-3.64 mBPB, 3-seed validated)	alexanderaperry-arch	#989
	1.1403	[Record] Stride-32 + Warmdown/Muon Tuning on SOTA #1: mean val_bpb=1.1403	haikosys	#274
	1.1407	12 layers GPT \| MLP_MULT reduction \| VE and BIGRAM modifications	rubenbalbastre	#845
	1.1407	Record: 1.1407 BPB — LeakyReLU^2 + Delayed QAT + Score-First TTT	Dhenenjay	#1087
	1.1412	Record: Unified Attention + FA3 + Legal TTT (val_bpb=1.1412, 3-seed)	VirajDeshwal	#1202
	1.1412	12L XSA-all + Partial RoPE + Batch 786K (1.1412 BPB, 13.5 MB)	KevinChunye	#1630
	1.1412	[Non-record] Universal Transformer Depth Recurrence INT6	thestbobo	#1640
	1.1413	Non-Record: Information-Theoretic Decorrelation Regularizer (val_bpb=1.1413)	gketronDS	#1840
	1.1413	Non-Record: Information-Theoretic Decorrelation Regularizer (val_bpb=1.1413)	gketronDS	#1841
	1.1417	Non Record: Add PPM heuristic for test time learning	AnirudhRahul	#511
	1.1418	Non-record: 27M params at Int5 QAT / train larger, quantize harder (val_bpb=1.1418)	cmcdnd	#469
	1.1418	Non-record: VR + GA + Late QAT + Full GPTQ — 1.1418 BPB, 15.7 MB	anantdgoel	#601
	1.1422	Add non-record 4xH100 10L Int5-MLP submission	ReNothingg	#602
	1.1425	Non-record: 11L + 30-Epoch Legal TTT (BPB 1.14252)	Christopher-Lee-McClendon	#526
	1.1426	Non-record: QAT & EMA negative results on SOTA stack (val_bpb=1.1426)	MultiFe22	#360
★	1.1428	Record: 10L Int5-MLP + BigramHash(10240) + SWA(0.4) + WD=0.04 (val_bpb=1.1428, mean 3 seeds)	thwu1	#180
	1.1428	Non-record: 4090 single-GPU ablations on ValCalib GPTQ + XSA stack (partial logs)	Wolfie8935	#1226
	1.1428	Value Residual + Gated Attention + XSA + EMA + AdamW TTT — val_bpb pending H100	sahiee-dev	#430
	1.1428	[track_10min_16mb] 50-Epoch Cosine LoRA TTT + SOTA (10L Int5/Int6 BigramHash SWA) — Atharva Date (ADIITJ)	ADIITJ	#467
	1.1428	Record: 11L NonTTT VR+GA MixedInt5/6: val_bpb=1.1428 (3-seed, 8xH100)	Asukabot0	#516
	1.1428	Add submission: 10L Enhanced with BigramHash(12240) + SOTA techniques	instax-dutta	#563
	1.1428	Depth Recurrence (3+3 x 2 loops) + HW Optimizations	maorinka	#648
	1.1428	Non-record: Technique Taxonomy — Tier List, Interaction Effects, and BPB Verification Tools	robbiebusinessacc	#891
	1.1428	Non-record: Technique Taxonomy — Tier List, Interaction Effects, and BPB Verification Tools	robbiebusinessacc	#892
	1.1431	Bigram-Aware Context Modeling with Mixed-Precision Quantization (val_bpb: 1.1431)	CREVIOS	#443
	1.1431	Bigram-Aware Context Modeling with Mixed-Precision Quantization (val_bpb: 1.1431)	CREVIOS	#447
	1.1431	Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431	SergheiBrinza	#1205
	1.1433	12L Int5-MLP + SmearGate + BigramHash + SWA (val_bpb 1.1433)	unixmadtoonslab	#76
	1.1434	Add non-record 16MB submission: Vocabulary1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits	shram86	#1474
	1.1436	[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436)	sseanliu	#303
	1.1441	Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean)	iverbovoy	#1384
	1.1442	Record: 11L XSA4 + EMA + TTT + Int6 MLP3x (val_bpb=1.1442)	chris-buckley	#317
	1.1443	Non-record RTX4060Ti 11L LeakyTTT 24h local (1.1443 BPB)	monkeyKingProgrammer	#1244
	1.1444	Non-record: LeakyReLU(0.5)^2 on SmearGate + BigramHash + Int6 stack (1.1444 bpb)	oidebrett	#1256
	1.1444	Non-record: 11L DepthRec PolarNS SWA	anderamondarainh-stack	#1661
	1.1444	Submission/qat bigram12k stride32	EthanYangTW	#348
	1.1446	Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB)	Christopher-Lee-McClendon	#461
	1.1448	Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB)	xuafeng	#306
	1.1448	submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)	BhatiaUday	#884
	1.1450	Submission TrigramHash + PartialRoPE + HeadTemp + stride32 (val_bpb: 1.1450)and	Ananddna	#327
	1.1450	Non-record: GDN Hybrid (E2E TTT / State-Space Model) — val_bpb 1.14502	andrewbaggio1	#1479
	1.1451	11L + LN Scale + BigramHash 3072x112 + GPTQ: val_bpb=1.1451	jayzuccarelli	#1675
	1.1452	Non-record submission: 11L XSA4 + EMA + BigramHash3072 + LZMA (1.1452 BPB)	Buld1n	#1386
	1.1453	feat: parameter golf v15b - SWA tuned (1.1453 val_bpb)	aktasbatuhan	#574
	1.1454	WIP: Shared-transformer + warmdown-aligned training (not final submis…	leofeasby	#420
	1.1454	Non-record: Shared-weight transformer with extended warmdown (1.1454 val_bpb)	leofeasby	#470
	1.1454	Progressive Depth + Hedge Mixer — val_bpb 1.1454	iverbovoy	#856
	1.1454	Record: LoRA TTT on GPTQ — val_bpb TBD (10min_16mb)	DilpreetBansi	#1457
	1.1454	Add 10L LeakyReLU + Gated Attention + Value Residual record (1.1454)	suchihype	#1859
	1.1455	11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB)	stukenov	#264
	1.1455	Non-record: Reproduction of SOTA #1 (SmearGate+BigramHash+Int6+SWA) on RunPod 8xH100	AbhayAnandUCSD	#1071
	1.1456	Non-record: MoE exploration + multi-bit quantization analysis	imyesung	#480
	1.1456	Non-record: Mamba-3 Hybrid + Multi-Epoch TTT + Dynamics-Protected Quant — 1.1456 bpb (3-seed mean)	mradassaad	#1890
★	1.1458	Record: Int6 MLP3x + SmearGate + BigramHash + MuonWD + SWA (mean val_bpb=1.1483)	raahilshah	#162
	1.1460	Non-record: Focal Loss (gamma=2.0) — val_bpb=1.1460	ibarrajo	#1233
	1.1461	Non-record: AR Self-Generated GPTQ Calibration (val_bpb=1.1461)	ibarrajo	#1234
	1.1462	Add Looped Transformer Design non-record submission (non tuned)	Aum08Desai	#325
	1.1464	Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb)	harborglowvintage-oss	#451
	1.1464	Add LLMAdvisor submission: 1.14638 BPB (track_10min_16mb)	harborglowvintage-oss	#665
	1.1464	Record: 12L RecycledCore Int5 — val_bpb 1.1464 (seed 1337)	shivangbaveja	#1573
	1.1465	Non-record: LLaDA-MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)	agalimova	#1100
	1.1465	Non-record: MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)	agalimova	#1106
	1.1465	Non-Record: HybridQuantGPT v6.1 H100 + Aggressive SLOT (steps=100, 3-seed 1.146523)	sisegod	#1456
	1.1466	Record: 11L Int5-All + XSA5 + EMA + 10% Pruning (val_bpb=1.1466)	trasnake87	#389
	1.1466	Non-record: 11L mixed int5/int6 + working QAT + TTT (val_bpb=1.1466)	vytautas-bunevicius	#421
	1.1466	Record: 12L + Catalytic Residuals + BigramHash(10240) + SWA + Late QAT (val_bpb=1.1466, mean 3 seeds)	zachgoldfine44	#450
	1.1470	[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)	mkenney2	#1245
	1.1470	Non-record: XSA F.normalize fix + byte-shuffle/brotli + Muon WD as compression knob	Bananakin1	#1709
	1.1472	Record: 11L, int6+zstd, decoupled WD (val_bpb = 1.1472)	devin-cog	#179
	1.1473	Non-record: Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb	mradassaad	#1643
	1.1473	Non-record: Mamba-3 Hybrid SSM + SP8192 + Legal TTT — 1.1473 bpb	mradassaad	#1644
	1.1476	Submission: 12L Int5-MLP BigramHash10K EMA (1.1476 BPB)	Skytuhua	#592
	1.1477	Non-record submission: BigramDim160 + 10% Prune + SWA (1.14767 bpb, 2 seeds)	bryjudy	#637
	1.1477	[Record Submission] QAT Int5/Int6 + Backout + U-Net Skips + BigramHash(10240) + SWA50 — val_bpb=1.1477	gowtham0992	#295
	1.1478	Results of 2026-03-23_MixedQAT_Int5MLP_Int6Attn	StolbaJ	#709
	1.1478	Record: 11L Int6 QAT + SmearGate + OrthoInit + SWA + TTT (val_bpb=1.1478)	yahya010	#150
	1.1478	The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)	newjordan	#498
	1.1478	The Frugendorff: Recursive Weight Sharing + MLP 4x (1.1478 BPB, 15.19MB)	newjordan	#499
	1.1478	Pre-Enrichment + EMA-GPU + SmearGate + XSA4 (val_bpb=1.1478, …	Idan3011	#996
	1.1480	Record: 11L Int6 QAT + SmearGate + SWA + SAM: 1.1480 BPB (3-seed mean)	baudrillardsgh0st	#194
	1.1483	Add non-record 10min/16MB submission: Wavelet-Lite PR549 Parallel Muon (1.1483)	bro4all	#680
	1.1487	10L MLP3x + BigramHash(2048) + SWA + Stride-32: 1.1487 BPB	Rhodrium	#331
	1.1488	Non-record: 11L Int6 QAT + SmearGate + SWA(0.4) + WD=0.04 (3-seed mean val_bpb=1.1488)	dentity007	#385
	1.1489	Record: 10L Int5-MLP3x BigramHash4096 SlidingEval — mean val_bpb 1.1489	suchihype	#583
	1.1490	Weight Entropy Regularization: Improved SWA Averaging (+0.028 BPB)	mer2234	#459
	1.1492	Non-record: Asymmetric 1/10 Split — 1.1492 pre-quant BPB on 8xH100 (one-line change)	ranausmanai	#1275
	1.1493	SpotlightLFB + Aux-Int6 Compression — 1.1493 BPB (3-seed mean)	ymrohit	#1142
	1.1497	Record: 11L Int6+Zstd MLP3x SmearGate BigramHash OrthoInit MuonWD EMA (mean val_bpb=1.1497)	mkenney2	#362
	1.1497	Record: Batch-Optimized 524K + Warmdown 4000 (val_bpb 1.1497)	shikhar1729	#364
	1.1497	Submission: SP8192 + Depth Recurrence + Muon 0.99 (1.1497 pre-quant BPB)	DevelopedByAnurag	#1739
★	1.1502	Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)	aruniyer	#86
	1.1502	Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502)	baudrillardsgh0st	#192
	1.1502	nGPT on the Hypersphere: Making Normalized Transformers Work at 16MB (Research)	DbBested	#1108
	1.1507	Record: Int6 STE + SmearGate + Seq2048 + OrthoInit + RoPE50K + SWA/100 (mean val_bpb=1.1507)	dexhunter	#206
	1.1507	[10min/16MB] AWQ + Cyclic Momentum + ReLU² + 11L Shared — 1.1507 bpb	SPThole	#623
	1.1507	10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)	Bortlesboat	#694
	1.1508	Record: 10L d=512 Int5-MLP Int6-Attn sp1024 (val_bpb=1.1508)	LoquiAuris	#465
	1.1509	[Non-Record] XSA-all-layers + VRL + bigram3072 + lzma9 — 1.1509 bpb, AdamW TTT findings	Hilo-Hilo	#1045
	1.1510	Non-record submission: 1.15 BPB in 16MB (GPTv3)	LappyG	#1068
	1.1511	FP8 + Arithmetic Coding + SWA (1.1511 BPB)	cruz-andr	#538
	1.1518	SmearGate + BigramHash + Int6 + SWA + U-Net Skips (1.1518 BPB)	integrate-your-mind	#289
	1.1520	Non-record: 11L int5/int6 + XSA + online TTT w/ decay prior (single-run val_bpb=1.1520)	JackYoung27	#302
	1.1520	Non-record: Knowledge Distillation - A Negative Result (val_bpb=1.152)	fielding	#1029
	1.1521	Non-record: TurboQuant mixed-precision int4/int5 (val_bpb=1.1521)	ibarrajo	#1238
	1.1522	Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522)	harsha-gouru	#477
	1.1522	Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522)	harsha-gouru	#482
	1.1522	Record: 10L CountInitBigram + XSA + PartialRoPE (val_bpb=1.1522)	harsha-gouru	#485
	1.1524	Submission: OrthoInit + Int6 MLP3x + SmearGate + BigramHash (val_bpb: 1.1524)	jfprincz	#164
	1.1526	PROTEUS v9 — 11L INT6 + single-epoch LoRA TTT (mean val_bpb=1.1526, 3 seeds)	MatoTeziTanka	#633
	1.1526	Non-record: Mamba-3 Hybrid + Full Hessian GPTQ + Late QAT — val_bpb 1.1526	mradassaad	#1355
	1.1527	Non Record: Partially Random MLP	meinlebenswerk	#1228
	1.1531	BESE v5.3: Novel 288-token tokenizer (non-record 16MB)	mrbese	#1621
	1.1531	Record: BESE 288-vocab Novel Tokenizer — 1.1531 BPB (3-seed mean)	mrbese	#1666
	1.1532	Add non-record shared-weight Frugendorff submission	siddhantparadox	#773
	1.1532	Record submission : Int6 + MLP 3x + Flash Attention 3 + NorMuon, val_bpb = 1.1532	tamoghnokandar	#173
	1.1532	Non-record submission: Depth Recurrence + Legal Score-First TTT (10L, 1.1532 BPB)	Christopher-Lee-McClendon	#456
	1.1533	Non-record: KNN Hidden State Retrieval — Scale Deception from Weak to Strong Models (8xH100)	himanshudongre	#1259
	1.1536	[Non-Record] Competitive Baseline: 10L GQA + Mixed Int6/Int8 + SWA + Seq4096 (val_bpb=1.1536)	rithunkp	#1065
	1.1537	Add ContextFuse-2048-BigramSmear submission	Julz19	#174
	1.1538	Add non-record unlimited-compute 11L LeakyTTT 16h local RTX 4060 Ti run	monkeyKingProgrammer	#1008
	1.1539	Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1539)	unnir	#135
	1.1539	[Record Submission] - 74.3M Ternary U-Net Transformer (v2 - Continuation from #PR640)	CiprianFlorin-Ifrim	#920
	1.1541	Non-record: 12L Int5-MLP + Int6-Attn mixed quantization, val_bpb=1.1541	alertcat	#219
	1.1541	Record: Int6 + MLP 3x + NorMuon + SmearGate + BigramHash + OrthoInit + Sliding Window, val_bpb=1.1541	MatthewHRockwell	#230
	1.1547	Non-record: 11L XSA-All + EMA + Legal GPTQ on 1xH100 PCIe (1.1546 bpb)	Rtx09x	#1353
	1.1548	Non-Record: 11L Low-Rank on Q192 (val_bpb=1.1548) 14.7MB in decimal	JayCheng113	#215
	1.1550	Record: Long Context + All Optimizations submission	chinesepowered	#166
	1.1551	LAWA-EMA frontier fork (pr198 base, SWA -> LAWA val_bpb=1.1551)	machdragon	#201
	1.1552	feat(arch): Mish² Activation & PyTorch Native SDPA GQA Core (1.155 BPB) 8xH100	demirelo	#653
	1.1554	Add PR114 RunPod H100 SXM non-record submission	greqone	#252
★	1.1556	Record: Mixed Quant Int6/FP16 + SmearGate + OrthoInit + MLP 3x + Sliding Window, val_bpb=1.1556	aquariouseworkman	#65
	1.1558	Record: 1.1558 BPB — 11L U-Net + Catalytic + SwiGLU + SW64	skarakulak	#507
	1.1565	11L XSA + SmearGate + BigramHash + SWA (mean val_bpb=1.1565, 3 seeds)	mahsumaktas	#186
	1.1565	11L XSA4 + SmearGate + BigramHash + SWA + RoPE50K (mean val_bpb=1.1565, 3 seeds)	mahsumaktas	#333
	1.1567	Non-record: Focal Loss for LM Pretraining — 1.1567 int8 BPB on RTX 4000 Ada (3-line change)	ranausmanai	#1380
	1.1568	Add 2026-03-20 11L dense-lexical submission candidate	ajkpersonal	#207
	1.1568	Staging: Int6 MLP3x 11L + SmearGate + BigramHash4096x128 + MuonWD038 + SWA50 + DocSliding (single-run val_bpb=1.1568)	ajkpersonal	#208
★	1.1570	Record Submission: 1.1570 BPB - 73.7M Ternary U-Net + NeoMuon + 4x relu²MLP + Factored Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8QAT + Bitmask-LZMA + Stride-16 Sliding	CiprianFlorin-Ifrim	#640
	1.1570	Fix: move Ternary UNet submission folder from track_10min_16mb to track_non_record_16mb	janwww	#730
	1.1573	Non-record: 11L XSA + Score-First LoRA TTT (1.1573, 1xH100)	swapp1990	#1090
	1.1574	Record: val_bpb=1.1574 — Int6 + MLP 3x + selective precision + optimized long-context training	saml212	#114
	1.1574	submission: 10L Int5-MLP + Aggressive Warmdown (WD=20000) — targeting <1.14 bpb	outsourc-e	#365
	1.1574	Non-record: 10L Int5-MLP + TTT + Backout Connection (val_bpb=1.1574 on 8xH100 SXM)	shivnarainms22	#366
	1.1574	12L INT4 bQAT + Value Embeddings — val_bpb 1.1588	SoHarshh	#1009
	1.1574	11L INT6 XSA-all + EMA + VE — ttt_bpb 1.1487	SoHarshh	#1216
	1.1575	Non-record: 10L Int6 QAT + SmearGate + SWA (val_bpb=1.1575)	dentity007	#273
	1.1576	Non-record: Legal Neural-Only No-TTT Alt (8xH100) val_bpb=1.1576	aamodbhatt	#947
	1.1580	record: 1.158	krammnic	#106
	1.1580	Non-record: CoDA-GQA Differential Attention — First Differential Attention Submission (val_bpb=1.1580)	anthony-maio	#932
	1.1583	Non-record: AWQ 2xH100 proxy no-compile quantized eval	Devchandrasen	#1976
	1.1585	Non-record: Gaussian per-token loss reweighting — what goes wrong and why (+0.014 bpb)	JulianTang2027	#1360
	1.1586	Add ppm mix nonrecord submission - SP8192 + Value Residual + Byte-Level PPM Mixture val_bpb=1.16 ppm_mix_bpb=0.83	lkk688	#2091
	1.1586	Add submission: Int5/Int6 + BigramHash + SmearGate + SWA + LLMAdvisor…	harborglowvintage-oss	#1983
	1.1590	record: 10L d496 WarmDown3500 SWA — val_bpb 1.1590 (1xH100 proxy)	Hilo-Hilo	#901
	1.1591	Record: 11L XSA4 + EMA + Partial RoPE + Rank-8 TTT Hooks (1.1591 bpb)	Divyesh-Thirukonda	#492
	1.1594	Record: Int6 MLP3x + STE QAT + Sliding Window (val_bpb=1.1594)	rsavitt	#128
	1.1596	Add SP4096 11L432 MLP3x Int6+Zstd Momentum99 record (val_bpb=1.1596)	kshitizz36	#251
★	1.1598	Record: 10L Int6 QAT + Zstd MLP2.6x Muon0.99 Sliding Window (val_bpb 1.1598)	yahya010	#63
	1.1598	Record: Compression-Funded MLP3x (val_bpb=1.1598)	chris-buckley	#191
	1.1601	Non-record: WiderMLP + FP16 Embed + Stride-32 (val_bpb=1.1601)	ansh-deriv	#222
	1.1601	12L rANS + LeakyReLU(0.95)² + Soft XSA (1.1601 BPB, non_record_16mb)	turbo-indubitable	#1215
	1.1602	feat(record): Int6 STE + NorMuon + SWA + Sliding Window (val_bpb=1.16019)	dexhunter	#156
	1.1603	Record: Sliding Window Eval, 2048 Vocab Size, fp16 embeddings, SWA, NorMuon, FA3; mean_val_bpb:1.160	mtybadger	#122
	1.1604	Cautious Muon + SP4096 + Depth Recurrence — val_bpb 1.1604 (non-record)	X-Abhishek-X	#1381
	1.1605	Record: Int6 MLP3x + MTP + Sliding Window Eval (val_bpb=1.1605)	seanward	#88
	1.1605	submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605)	takhir-iota	#99
	1.1606	Non-record: Legal Neural-Only No-TTT (8xH100) val_bpb=1.1606	aamodbhatt	#946
	1.1609	Non-record: 11L Int6 + Online Logit Bias (val_bpb=1.1609)	bopmite	#330
	1.1613	Non-record: AttnGate + MiniRecur + EMA (1.1613 BPB)	bharadwaj1098	#2095
	1.1618	Int6 MLP3x + Tuned LR + SmearGate + SlidingWindow (val_bpb: 1.1618)	unnir	#102
	1.1622	record: val_bpb=1.1622, NorMuon + int6 STE + SWA + sliding window	vmfunc	#89
	1.1623	Record: MLP3x + Int8 Tok Emb + Grouped LZMA + Sliding Window (val_bpb=1.1623)	ChaseWNorton	#160
	1.1624	Add non-record 11L int6 challenger 8xH100 attempt	JWLBOYCE	#209
	1.1628	Record: 10L Int5-MLP + SmearGate + BigramHash + Late QAT (val_bpb=1.1628)	chris-buckley	#286
	1.1629	Record: Pre-Enrichment + Encoder Recurrence + XSA + SmearGate + BigramHash (val_bpb=1.1629)	Idan3011	#187
	1.1629	Late STE QAT + Int6 MLP3x + SmearGate + BigramHash + OrthoInit + Overtone + SWA + SGD TTT (int6+zstd-22)	davidpuertolas	#297
	1.1631	Record/smaller batch sota, val_bpb 1.16314679 (post-quant, int6+zlib, sliding eval)	ankitmaloo	#147
	1.1632	ArjunAutoResearch: MLP 3x + STE int6 QAT + seq4096 + sliding window. val_bpb 1.1632	arjun-krishna1	#66
	1.1634	Record: SwiGLU + BigramHash + SWA, val_bpb=1.1634 (8xH100 verified)	JoeProAI	#373
	1.1634	Non-record subimission: RecurrentTiedDepth_8x2_FiLM records	loveless2001	#552
	1.1636	Record: SP4096 5L MLP6 BigramHash XSA5 — val_bpb 1.1636	Blitzo125	#1897
	1.1636	Record: SP4096 5L MLP6 BigramHash XSA5 — val_bpb 1.1636	Blitzo125	#1917
	1.1639	Non-record submission: Weight-Tied 6Lx2 d=672 (1.1639 BPB)	yuitokyouni	#1568
	1.1642	Add MergedTop3_v3 clean 8xH100 record-track submission	hesong0222-dev	#698
	1.1642	Record: Vocab 4096 + MLP 3x + Sliding Window Eval (mean val_bpb=1.1642, 3 seeds)	saikrishnarallabandi	#123
	1.1645	[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)	sseanliu	#294
	1.1645	[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)	sseanliu	#296
	1.1646	Non-Record: McGilchrist Register Token — causal cumulative mean + FiLM global context pathway	aramdov	#1022
	1.1648	Int6+zstd MLP1488 + Sliding Window + QAT + Tuned LR (val_bpb=1.1648)	m0at	#107
	1.1650	12L INT4 bQAT + EMA Fix + Deterministic QAT — val_bpb ~1.165	SoHarshh	#1002
	1.1653	Add record: 9L MLP3x LeakyReLU(0.5)² QAT Int6+zstd (val_bpb=1.1653)	andreanjos	#929
	1.1659	Submission: Wider MLP 3x + int6 quant + sliding window eval, val_bpb=1.1659	jfprincz	#70
	1.1659	Memory Tokens + Mixed Quantization (val_bpb: 1.1659)	sp00mm	#351
	1.1659	Memory Tokens + Mixed Quantization (val_bpb: 1.1659)	sp00mm	#352
	1.1666	Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666)	abhishekgahlot2	#116
	1.1666	Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666)	abhishekgahlot2	#137
	1.1667	Add Nuclear Stack submission: 1.16668 BPB (seed 2884431328)	timowhite88	#178
	1.1668	Record: Int6 + Canon ACD (K=3) + Muon WD 0.04 + SWA + Sliding Eval (val_bpb=1.1668)	chanwoo-park-official	#312
	1.1669	Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669)	baudrillardsgh0st	#170
	1.1670	Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)	polarizedfortnite-cpu	#81
	1.1672	12L Full-INT4 (MLP + Attn) + BigramHash(4096) — val_bpb 1.1672	Naazimsnh02	#305
	1.1682	Non-record: S4D-Lin SSM Hybrid — Fixing Why Mamba Failed in Parameter…	himanshudongre	#1013
	1.1688	Non-record: Emergent weight symmetry in QO projections + learnable SymMix	gersh	#1214
	1.1690	Non-record: 6-Technique Stack — Catalytic Residuals + Value Residual + Gated Attention + BigramHash(10240) + 12L (val_bpb=1.1690)	joshuaswarren	#474
	1.1691	13L Tensor-Train Attention + BigramHash 8192	NikitaBabenko	#1888
	1.1696	Recursive Transformer 4B/7L + VE + QAT + TTT — val_bpb 1.1696 (3-seed mean)	Tonyy1977	#927
	1.1697	[Non-record] LoRA TTT + HParams (val_bpb=1.16973333)	Mistobaan	#299
	1.1702	submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702)	trovatochris	#117
	1.1702	[Non-Record] QAT + NTK-4096 Eval + Cosine Warmdown + Aggressive SWA	crony-io	#324
	1.1704	Record: Int6 3xMLP + Cosine Warmdown (val_bpb=1.1704)	kvmukilan	#243
	1.1704	non-record: int6 3xMLP + cosine warmdown (1.1704 bpb)	kvmukilan	#246
	1.1704	non-record: int6 3xMLP + cosine warmdown (1.1704 bpb)	kvmukilan	#249
	1.1708	SubSixteen v2: Int6 QAT + MLP 3x + SWA + Sliding Window (val_bpb 1.1708)	TevBenji	#69
	1.1711	Non Record: GPTQ int7 XSA BigramHash — val_bpb 1.1711	Rajat123456789	#1378
	1.1715	Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715)	yashverms	#418
	1.1716	[record] val_bpb=1.1716 — DEQ Universal Transformer + Seed-LoRA + Mixture of Depths	ismailntl	#1783
	1.1719	Add WaveletWeightedWidenet submission directory with README and metadata	dubthecat	#211
	1.1720	Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack	anantdgoel	#487
	1.1722	Add lzma6 submission (1.172 bpb, 10min_16mb)	lee101	#329
	1.1724	Non-record: Compression moonshots — 8 negative/marginal findings (Procrustes, SWA smoothness, selective fp16, pruning+zstd)	mrdavtan	#1048
	1.1725	The Stinky Frost Recipe — 1.1725 BPB	newjordan	#190
	1.1725	Add non-record EMA and adaptive export exploration	someone114514	#424
	1.1726	Hybrid ngpt gpt mamba submission	vardanbobo007	#2073
	1.1730	Add hybrid nGPT-GPT-Mamba submission	vardanbobo007	#2070
	1.1732	Add submission: 10L Slide64 Mid6, val_bpb=1.1732	GLDRoger	#176
	1.1734	Non-record: LoRA TTT exploration on SOTA base (negative result)	hmlizama	#658
	1.1734	Non-record: Higher-Rank Output Heads — Standard Tied Head Wins on a Frontier 11L Baseline	albertorkive	#908
	1.1739	Non-record: 10L FP16-Embed + Warmdown20k	codestrongestx	#381
	1.1744	Add TTT (Test-Time Training) submission: 1.1767 BPB	timowhite88	#152
★	1.1748	Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748)	notapplica	#60
	1.1750	Add 10min/16MB record: skinny RLM seq2048 (int8+zlib val_bpb 1.1750)	k-oconnor	#575
	1.1752	Int5/Int6+Zstd+MLP3x: mean val_bpb=1.1752 (10L, seq4096, sliding window)	shajalahamedcse	#546
	1.1752	Record: Int5/Int6+Zstd+MLP3x — mean val_bpb=1.1752 (10L, seq4096, sliding window)	shajalahamedcse	#547
	1.1753	Record: SP4096 int6+zstd 10L496 overtone+phase sliding (val_bpb=1.1753)	kshitizz36	#217
	1.1761	Nightcrawler — 1.176bpb 10mb	newjordan	#1208
	1.1763	Add non-record submission for full80 recurrence seq1536 20k	shasank0001	#1345
	1.1764	Sliding Window + Long-Context Training: val_bpb=1.1764	saml212	#96
	1.1768	Add seq4096 sliding-window fp16 tok coarsen record	takhir-iota	#75
	1.1770	Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations	ksang123	#367
	1.1779	DenseContextQuantTrim 8xH100: 1.1779 val_bpb	IvGolovach	#256
	1.1779	Add ContextFuse-2048 submission	Julz19	#143
	1.1785	Record: SP8192 + 9L Breadcrumb + EMA + StochDepth — val_bpb 1.17845772 (legal)	Unwindology	#1822
	1.1787	Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787)	vishesh9131	#310
	1.1791	Non-record: 11L 3x MLP Seq2048 — val_bpb 1.1791 (8xH100 SXM)	Rohan-Abhilash	#1505
	1.1792	MetaStack v3: 1.1792 sliding bpb, 10L BigramHash SmearGate OrthoInit SWA	xinpw8	#205
	1.1801	Non-record: AutoResearch Value Embeddings + MLP3x, 1.1801 bpb (1x RTX 4090)	ivanontech	#1139
	1.1801	Non-record: AutoResearch Value Embeddings + MLP3x, 1.1801 bpb (1x RTX 4090)	ivanontech	#1141
	1.1803	SP8192 + 9-Layer + Breadcrumb Gating + EMA + Stochastic Depth - 1.1803 BPB (legal)	Unwindology	#1724
	1.1804	Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804)	rarce	#534
	1.1804	Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804)	rarce	#543
	1.1807	Submission: Atris Labs v8 (audited seed42, clean branch)	keshav55	#671
	1.1807	Non-record: Int6 QAT + MLP1472 + SlidingWindow + TTT (val_bpb=1.1807)	lookin-zz	#301
	1.1807	Record: Atris Labs — 3-seed mean val_bpb=1.1807, 10L MLP3x Int5/Int6 BigramHash SmearGate SWA	keshav55	#515
	1.1807	Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x	zeytx	#805
	1.1812	Add 3xMLP + Mixed Quant + Blockade/Sigma submission (val_bpb: 1.1812)	GMaN1911	#172
	1.1821	Non-record: Semantic Tube Regularization — Geometry Improves, BPB Doesn't (Compute–Regularization Tradeoff)	albertorkive	#894
	1.1826	Non-record: Soft MoE Exploration — Dense Gating Fixes Sparse Router Collapse Under 16MB (WIP, val_bpb=1.1826)	HugoOchoaLP	#660
	1.1828	[Non-Record] Hymba: Hybrid Attention + Mamba SSM (val_bpb 1.1828)	mkenney2	#599
	1.1834	Add non-record 16MB submission: FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ	shram86	#1447
	1.1834	Add non-record 16MB submission: FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ	shram86	#1448
	1.1836	PROTEUS EMA — val_bpb: 1.1836 (3-seed mean, Notable Non-Record)	MatoTeziTanka	#95
	1.1839	11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839	carlesonielfa	#457
	1.1844	Non-record: Linearized Neural Memory + TTT (val_bpb=1.1844)	mihir-s-05	#182
	1.1854	Non-record: No-FA3 stack combination — val_bpb 1.1854 (1-seed, 8xH100)	akaiHuang	#1442
	1.1855	Record: Pre-Enrichment + Encoder Recurrence (val_bpb=1.1855)	Idan3011	#184
	1.1858	Record: 8192 Vocab Size, NorMuon, Selective Quantization; 1.186 val_bpb	mtybadger	#78
	1.1863	Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB)	BrandtChristian	#1760
	1.1864	Add record: Optimizer Tuning + Sliding Window Eval (val_bpb=1.1864)	andreanjos	#321
	1.1870	Record: FP16 Embed + Sliding Window Eval + Warmdown Tuning (pending eval)	JoeProAI	#113
	1.1872	Non-record: SDClip-matched FakeQuantize — reduces quant degradation from +0.17 to +0.044	Amanbig	#1773
	1.1873	[Non-Record] Hymba-LongContext: 32K context training via hybrid SSM + SWA (1.1873 BPB)	mkenney2	#914
	1.1874	Crawler — 8.8MB -1.1874 BPB (3-seed mean, 8xH100, 600s)	newjordan	#1140
	1.1875	JEPA Implementation Path: Add Non-Record 10-Minute SP8192 BPE Submission with Self-Contained Data Setup	divagr18	#1922
	1.1875	Non-record: Mamba3 Hybrid + GPTQ Long Context (1.1875 BPB)	samquiring	#1268
	1.1876	Record: sliding eval, FP16 tied embeddings, 10 layers, Muon WD 0.02, overtone init, and phase-transition residual mixing. (val_bpb 1.1876)	peytontolbert	#155
	1.1877	SkipQuant Adapter TTT + Causal PPM-D Byte Mixture (~1.1876 BPB 1M slice, ~0.8997 est)	Hetul803	#1861
	1.1877	Add SkipQuant Adapter TTT causal PPM-D byte mixture	Hetul803	#1862
	1.1882	Preliminary: 11L VRL + Full GPTQ + Parallel Muon + Legal TTT — val_bpb 1.1882 (ADIITJ)	ADIITJ	#960
	1.1884	Add seq4096 fp16 tok coarsen record	takhir-iota	#74
	1.1887	Add LoopFullAttnRes + LoopQ + XSA submission	maxrubin629	#2081
	1.1888	1.1888 BPB via SP-4096 compression + stride-64 sliding window	kshitizz36	#53
	1.1890	11L INT6 + Backward-Looking Per-Document LoRA TTT	haimianbaobao007	#550
	1.1893	Non-record: staging profile (LAWA + slide eval) on 8xH100 (val_bpb=1.18926428)	machdragon	#197
	1.1896	[Non-Record] JEPA Self-Distillation with EMA Target Encoder for Autoregressive LM (val_bpb: 1.19) \| Current Noisy/Negative Result	MVPandey	#896
	1.1898	Non-Record: DG Attention, Differential-Gated Attention with Depth-Scheduled Novelty Encoding: (val_bpb=1.1898)	ddavidgao	#542
	1.1899	Submission: 10L + Sliding Window eval (mean val_bpb=1.1899)	shajalahamedcse	#221
	1.1903	Non-record: Byte-level transformer + JEPA auxiliary loss (val_bpb: 1.1903)	jfprincz	#832
	1.1908	Non-record: Paper-Folded Recurrent GPT SP8192 5-fold	ymrohit	#1949
	1.1914	Record: CLASE-Quant adaptive layer quantization (val_bpb=1.1914)	NewyorkDev	#309
	1.1915	Non-record: Oscillatory Recurrence at Layer 0 (1.1915 BPB, 3-seed)	amabito	#1221
	1.1917	Non-record: random-map adapter train/eval ablations on 8xH100	papalino456	#1164
	1.1920	Add artifact-aware LateQAT fixed-plan record	cardona	#1839
	1.1920	Restore non-record submission: 2026-04-08 Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ	shram86	#1496
	1.1921	SP8192 Depth Recurrence + Parallel Residuals + TTT (1.1921 BPB)	yu314-coder	#1628
	1.1925	Record: Quant Quality: val_bpb=1.1925	ankitmaloo	#142
	1.1925	Record: Sliding Window Eval (stride=64), val_bpb=1.1925	mattqlf	#50
	1.1925	Submission/2026 03 22 Sliding Window + WARMDOWN + AttnRes + PhiSimple (mean 1.1925 BPB)	ikermoel	#500
	1.1928	docs: add TIPS.md and resolve environment dependency issues (#280, #82, #43)	adityagupta26	#357
	1.1928	Non record: Single H100 10 min 1.24 BPB	adityasasidhar	#1547
	1.1929	Add non-record BigramHash4096 + MLP992 + LR0.08 + Slide64 submission	josusanmartin	#355
	1.1929	Non-record: SWA and doc-isolated eval ablation — two negative findings at stride=64	mrdavtan	#199
	1.1932	Non-record: BitNet Ternary — 65M params in 15.9MB (1.1932 BPB)	chrislovescoding	#666
	1.1933	Record: 7L MLP3x 4kSeq LR-Tuned (val_bpb=1.1933)	sofiabod	#446
	1.1935	Non-record: 1x RTX PRO 6000 Blackwell 10L Int5-MLP (1.1935 BPB)	Rohan5commit	#560
	1.1938	Record: 8192 Vocab, Sliding Window Eval, Selective Quantization; 1.194 val_bpb	saikrishnarallabandi	#92
	1.1942	Record submission: Distill+IntraLoop SP1024 9x512 (val_bpb=1.1942)	divagr18	#1623
	1.1946	Non-record: XSA-All + QK Gain 4.0 + LN Scale — 45 Experiments on 1×RTX 5090	jainpranjal97	#1125
	1.1948	VQ-VAE Weight Compression (non-record track)	WeijieChen2017	#1335
★	1.1950	[record bpb=1.195] sliding window + LoRA TTT	samacqua	#77
	1.1957	Add TTT-LoRA 512d submission (val_bpb=1.1957)	santosh5541	#157
	1.1957	Record:Add TTT-LoRA 512d submission (val_bpb=1.1957)	santosh5541	#159
	1.1957	Record:Add TTT-LoRA 512d submission (val_bpb=1.1957)	santosh5541	#161
	1.1962	Non-record: DepthScale — Parameter-Shared Iterative Transformer (1.1962 BPB)	Lumi-node	#1509
	1.1971	Non-record submission: Learned Adapters on Random Linear Maps	pranavxiyer	#2058
	1.1973	Sliding Window Eval + Muon6 (val_bpb 1.1973)	beee003	#169
	1.1974	Non-record: AutoResearch Batch Optimization — 1.1974 bpb (1× RTX 4090)	ivanontech	#1036
	1.1978	Merge: Autoresearch/mar28 experiments on 4xH20	demouo	#1052
	1.1980	Progressive Depth Training — val_bpb 1.1980	iverbovoy	#835
	1.1986	Non-Record Submission: 1.1986 BPB — HybridQuantGPT v6.1 rANS + Legal TTT	sisegod	#1123
	1.1989	Non-record: MUD optimizer — triangular Gram preconditioning (arxiv:2603.17970)	SelfAnush	#510
	1.1995	Shallow Blue: BOS-Reset Exact Memory Probe	jxgod	#1478
	1.1996	Int5 MLP + Int6 Attn + zstd-22, val_bpb 1.1996	edidisheng	#1059
	1.2000	Non-record 10min/16MB: GQA Macro Meta-Preconditioned (val_bpb 1.19995)	FF-GardenFn	#2038
	1.2005	Non-record Submission: SwiGLU 3x + Dynamic Wallclock Cosine	yuvraajbains	#799
	1.2006	Add SmearGate+BigramHash context-repair submission (1.2006 BPB, 15.0MB)	handemanai	#448
	1.2008	Innovation Architecture: Arterial Mixer (N-Lane Transformer)	seinare	#1863
	1.2012	Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012)	khasinski	#37
	1.2012	Record: SP4096 + Int6 QAT + NorMuon (val_bpb=1.2012)	khasinski	#200
	1.2013	add 10L hybrid SWA 2048 record	jmattew	#1882
★	1.2014	New SOTA attempt (val_bpb=1.2014)	spokane-way	#52
	1.2026	Record: 6L depth minimalism U-Net sliding window - val_bpb 1.2025	alphastar1111	#1527
	1.2026	Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_bpb=1.20262)	aniketio-ctrl	#426
	1.2029	Non-record: Mixed-Int6 LZMA9 B3072 Warm5000	sabdulmajid	#1438
	1.2029	Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029)	ksang123	#139
	1.2035	Non-record: 12L Low-Rank Q + QAT (1xH100, pre-quant 1.2035)	SkywardSyntax	#316
	1.2036	Record: SEQ_LEN=4096 training	lenguyen1807	#231
	1.2037	PROTEUS v4 — non-record submission (val_bpb: 1.2037)	MatoTeziTanka	#368
	1.2045	Non-record: FP16 embed + WD20k + seq2048 + doc-isolated sliding window (val_bpb=1.2045)	mrdavtan	#151
	1.2047	Add non-record submission: HybridSSM SparseAttention MixedQuant	estesryan	#1829
	1.2050	Add healing-phase training submission (1.205 val_bpb)	luccifer00	#1752
	1.2052	Non-record: QAT ablation — int8 QAT overhead exceeds quantization gap recovery	mrdavtan	#145
	1.2055	Non-record: Competitive Stack + Phonetic Tokenization Exploration (val_bpb=1.2055, 4xH100)	nalediym	#454
★	1.2058	SOTA attempt (val_bpb=1.2064)	spokane-way	#49
	1.2058	Add 1.20 BPB submission with Legal TTT and Calibration (9L/448D)	Vibes-me	#976
	1.2058	Add 1.20 BPB submission with Legal TTT and Calibration (9L/448D)	Vibes-me	#1038
	1.2058	Add LocalGlobal_SwiGLU_SeqPack_MixedQuant submission	estesryan	#1937
	1.2064	Non-record: leader-core valid-eval parity run + 1xH100 proxy screens	simon-marcus	#244
	1.2064	[Notable Non-Record Submission] To JEPA or Not to JEPA: That Is Le Question (32.8M LeWorldModel Mamba2 Style Text Implementation - 1.2064 BPB )	CiprianFlorin-Ifrim	#903
	1.2065	Non-record: Depth Recurrence + XSA + LeakyReLU² (val_bpb 1.2065)	iverbovoy	#784
	1.2066	Add 1.2066 record: 8L Depth Recurrence by trhgbao	trhgbao	#1472
	1.2067	[Non-record] Codebooks! - val_bpb 1.2067 (3-seed mean)	mtybadger	#1433
	1.2070	[Non-record] Scaled Byte-level H-Net matches 4-hour subword-level baseline (H-Net val_bpb = 1.2070)	DariusFeher	#1305
	1.2073	Record: 1.2073 bpb • 11L gold6 • 8xH100	pall23-mech	#649
	1.2073	Record: SP8192 + Headwise Gated Attention + LeakyReLU2 + Legal TTT (val_bpb 1.2073)	jamesEmerson112	#1799
	1.2075	Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)	nglain	#141
	1.2079	[Non-Record] LegendreGPT: Legendre polynomial depth parameterization	sergimichi	#1337
	1.2089	Non-record: Int6 QAT + 11L 512d + Sliding Window, val_bpb=1.2089	dibdabo	#225
	1.2091	SwiGLU dim=576 + Sliding Window + Muon WD (1.2091 BPB)	Focus2321	#163
	1.2092	Non-record: Add submission track_non_record_16mb/2026-03-23_DepthRecurrent_TTT	SergiuDeveloper	#495
	1.2092	LeakyReLU + XSA + PartialRoPE + FA3 submission — val_bpb 1.1991	kjahan	#1427
	1.2093	[Non-Record 16MB] Int8 QAT 7L d512 + Sliding Window + N-gram Backoff + TTT	AlirezaAlampour	#2089
	1.2093	[WIP] Record: Hybrid architecture 8L 3:1 GDN/Transformer (val_bpb=1.2093)	phulin	#651
	1.2094	Non-record: Full Attention + LZMA + small BigramHash (val_bpb=1.2094)	ibarrajo	#1250
	1.2097	Non-record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.209735	Abhishek8108	#1553
	1.2098	basic submission improving baseline	elad-simbalista	#1748
	1.2101	Record: Seq2048 training + eval (val_bpb=1.2101)	ibarrajo	#136
	1.2111	Non-record: LeakyReLU2 + MuonWD + SlidingWindowEval, val_bpb=1.2111	RishabhPrakash5	#1975
	1.2115	Add Parameter Golf submission: Vocab768_LinearPhaseInit_GatedXSA_EMA_…	shram86	#1015
	1.2135	11L 512d Int8+Zlib Baseline (val_bpb 1.2135, 3-seed)	nickferrantelive	#858
★	1.2139	Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers)	nanlliu	#39
	1.2142	Non-record: GPTQ-lite Hessian Quantization + EMA — val_bpb 1.2142 (dim384, 11L, 15.5MB)	dmpta	#2092
	1.2151	Byte-Level Tokenizer-Free Transformer: 1.2151 BPB (beats baseline 1.2244)	seanward	#705
	1.2154	warmdown-quantization val_bpb = 1.2154	saml212	#61
	1.2156	Record (pending): 92-experiment autoresearch + sliding window eval, pre-quant val_bpb=1.2156	hydeh3r3	#85
	1.2160	NTK Eval + Overtone Init (val_bpb=1.2160)	notapplica	#59
	1.2162	Mixture of Convolutions (MoC): token-adaptive short convolutions via kernel mixtures	andrewmouldon	#966
	1.2164	Non-record: ASQU activation, Mixture of Convolutions, BankedLinear	andrewmouldon	#679
	1.2167	[Non-record] EBT - Energy-Based Transformer	alexdwu13	#2063
	1.2168	Submit 9L 2xMLP optimized parameter run with val_bpb 1.2168	Git-Aarya	#736
	1.2174	Record: 11L Adaptive Markov + Int6 Mixed Quant (1.2174 bpb)	Jayteare	#1046
	1.2182	V2 Prototype: SwiGLU + Dropout + MuonWD + MidLayerLoop	starfly-web	#340
	1.2185	Add BitNet b1.58 Ternary Quantization (non-record submission)	erikqu	#760
	1.2192	[Non-Record] Single H100 16mb< 1.21bpb	adityasasidhar	#1617
	1.2193	SP8192 + Depth Recurrence + Parallel Residuals + TTT + SDCLIP + GPTQ-Brotli — 1.2192 BPB (LLMAdvisor.ai) [SUPERSEDED]	harborglowvintage-oss	#1973
	1.2193	SP8192 + Depth Recurrence + Parallel Residuals + TTT + SDCLIP + GPTQ-Brotli — 1.2192 BPB (LLMAdvisor.ai)	harborglowvintage-oss	#1974
	1.2194	Aweb Optimized Baseline — 1.2194 BPB	manfromnowhere143	#181
	1.2195	add non-record submission; 2026-04-30_SP8192_GPTQ-Embeddings_SDClip_Loop45x2_PLE_20min	BumaldaOverTheWater94	#2062
	1.2196	Depth Recurrence + Cross-Repeat Skip + Sliding Window Eval	iverbovoy	#148
	1.2196	Non-record: Annealed Muon 1.58-bit Ternary — val_bpb 1.2196 (8xH100 SXM)	DushyantChetiwal	#1273
	1.2197	fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197)	chonchiog	#42
	1.2199	[notable non record] Train-Time Overparameterization: Better Models Through Transient Expansion	andrewmouldon	#1551
	1.2200	Non-record: 12L Compression-Aware Training Orchestration with ProxQuant	mollahasani	#1357
	1.2201	11L MLP2x + LeakyReLU² + Legal TTT (val_bpb=1.2201, 3-seed mean, std=0.0015)	Programmerryoki	#1057
	1.2205	Add Model Stack BitNet MLP2304 overtone run	peytontolbert	#1891
	1.2206	Non-record: DP tokenizer beats naive baseline but fails to close gap to SP — 47-run controlled study (1.2206 @ 4096)	Stuckertks09	#1504
	1.2207	Non-record: 11L PartialRoPE + LNScale + EMA + SWA + TTT (1xH100 107min, val_bpb=1.2207, 15.4MB)	nathon-lee	#334
	1.2207	[non-record track] BankLinear: cross-layer shared weight bank	andrewmouldon	#1315
	1.2208	Record: CUTLASS EVT Backward MLP Fusion + Brotli + Turbo-Muon + Memmap	abaybektursun	#1105
	1.2208	Proposal: Validate ASQU on the March 22 10min/16MB control line	fahmitech	#1247
	1.2219	[10min/16MB] TrigramHash + EMA-SWA + Int4 QAT — val_bpb 1.2219	Ashutosh3142857	#440
	1.2225	[non-record track] Hierarchical Shared Attention (HSA): multi-level sharing across attention heads	andrewmouldon	#1264
	1.2236	[non-record track] BankLinear: cross-layer shared weight bank with learned + random mixtures	andrewmouldon	#812
	1.2240	Modal 8xH100 LowerLR FP16Embed 960 (val_bpb 1.22395)	kiankyars	#45
	1.2244	[Partial submission] naive baseline + dispersion loss	ChenLiu-1996	#34
	1.2244	Submission: Top-Heavy FFN Allocation + Packed Int6 Export \| pending eval	mr-ashish-panday	#110
	1.2244	[WIP] Sparse Attention + Recursive Weight Sharing for 16MB Efficiency	albertorkive	#5
	1.2244	Tier 6: PPM-C eval-time context mixer (standalone + neural mixing)	Cwarren15-A	#283
	1.2244	Submission/fastattn mtp dr	AVINASH0052	#1691
	1.2249	Notable Non-Record: Universal Transformer — 1.2249 BPB — Depth Recurrence with Iteration Embeddings	gowtham0992	#1110
	1.2252	Submission/hybrid rwkv token shift	dillon-blake	#1007
	1.2252	Submission/hybrid RWKV token shift	dillon-blake	#1112
	1.2257	Submission/muoneqr weight decay	dcynth	#1942
	1.2257	commit non-record	jupram	#437
	1.2265	Non-record: ACN Output Accumulator on the Naive Baseline	chrico-bu-uab	#2093
	1.2268	Non record: In DRAM compute quantized ternary + int4	fingoldin	#2112
	1.2271	Ultimate recurrent: 21 techniques — depth recurrence, novel ops	MrINVISO	#298
	1.2283	Add non-record submission for 15L hybrid tail seq1536 20k	shasank0001	#1346
	1.2286	saliency-guided local 5090 non-record	liveyourday	#1580
	1.2294	Non-record: 2026-03-22_SuperchunkBPE_SP1024	eshansinghal14	#506
	1.2296	Add Modal 8xH100 timed validation non-record submission	kiankyars	#41
	1.2299	Non-record: TWEO early-cosine outlier regularization on SP1024 baseline	PapaFranku4647	#1636
	1.2302	Non-record: LeakyReLU² + LAWA + Ramping WD + Val Training (val_bpb=1.2302, 1xH100)	ChideraIbe123	#675
	1.2302	NorMuon + Deeper U-Net with INT6 Fake Quantization	sea-rod	#2016
	1.2310	[Non-record] Depth Recur + Randomized Linear Maps :: Parameter-efficient Repartitioning for Shared Modules	SPThole	#2090
	1.2311	records(non-record-16mb): JEPA-on-LM 14-run ablation (negative result)	eren23	#2142
	1.2313	record submission: nGPT	mhlov000111	#1957
	1.2314	Structured Embedding Init (BPB 1.2314)	brandonpf	#1165
	1.2320	Add record: INT6 10L SWA NorMuon, val_bpb=1.2320	Akasxh	#204
	1.2321	Non-record: JEPA v3 — span-masked I-JEPA + VICReg, val_bpb 1.2321	aiejvn	#1581
	1.2326	[non_record_16mb] 12L dim=448 LeakyReLU^2 BGVOCAB=2048 GH200 proxy (val_bpb=1.2326)	Okropniak	#1253
	1.2334	Non-record: Hybrid Depth-Recurrent Transformer + Int5 Quantization Studies	trasnake87	#288
	1.2349	add pr2069 best 8xh100 submission package	tenet-diver	#2125
	1.2350	Non-record: BitNet 65M params — val_bpb 1.235	peytontolbert	#1811
	1.2355	Add chasewebb 9x512 sp1024 baseline (val_bpb: 1.2355)	chasewebb	#195
	1.2364	Non-record: TTT-LoRA Base — HumanAI Convention (val_bpb=1.2364)	humanaiconvention	#600
	1.2370	Non-record: 9L SwiGLU MLP2 on 8xH100 (val_bpb 1.2370, 15.9MB)	ntwari-bruce	#1428
	1.2374	Add MaxParams6L_120 submission (1.2374 BPB) to track_non_record_16mb	NishantDahal	#391
	1.2374	Add MaxParams6L_120 submission (1.2374 BPB) to track_non_record_16mb	NishantDahal	#395
	1.2381	[Submission] Warmdown Scheduling - 1.2430 BPB on 8×H100 SXM	MajdiZamim	#48
	1.2392	Non-Record: 8L + BigramHash(12288) + Systematic HyperOpt (val_bpb=1.2392, 1xH100, 129 experiments)	CrimsonSithria	#436
	1.2392	Add BigramHash: hashed bigram embeddings with optional dim projection	CrimsonSithria	#441
	1.2409	Non-record: Universal Transformer with Adaptive Computation Time	5en5e1	#1293
	1.2417	Non-record: 7L + BigramHash Projection + Batch Scaling (val_bpb=1.2417, 1xH100)	CrimsonSithria	#393
	1.2421	Add submission: Mixed Quantization + BigramHash + SWA (val_bpb 1.2421)	SergheiBrinza	#370
	1.2427	Non-record: 10L mixed int5/int6 export reaches ~10.4MB with strong throughput	simon-marcus	#272
	1.2450	Add non-record submission: Multi-model cross-attention with dimensional asymmetry	alientony	#1352
	1.2459	Add non-record SP8192 proxy-stack submission (3-seed)	gmn0105	#1763
	1.2459	Submission: val_bpb=1.2459 (autoresearch-optimized)	joeynyc	#343
	1.2498	Single H100 10 min 16mb< 1.24 bpb	adityasasidhar	#1559
	1.2500	Blackwell local nonrecord	pall23-mech	#793
	1.2519	Non-record: GatedDeltaNet, 32K Context, Document-Boundary State Reset	brian386	#939
	1.2519	Non-record: Mixed-Temperature Self-Generated GPTQ Calibration on V6	ryankagygamestop2	#1996
	1.2529	Non-record: Cache LM + LoRA TTT (negative result on cache, positive on TTT)	anantdgoel	#183
	1.2540	Non-record unlimited-compute: 1-hour 1xH100 warmdown 9x512	aamodbhatt	#111
	1.2542	Non-Record Universal Transformer submission. (2x Attention layers, 3 Layer MLP, depth scheduling)	serdardoesml	#1088
	1.2542	feat(records): add SP8192 BPE Mamba3 SSM hybrid 16MB non-record submission	divagr18	#2154
	1.2542	feat(non_record): add SP8192 BPE Mamba3 SSM hybrid 16MB non-record submission	divagr18	#2155
	1.2543	Non-record 10min/16MB: PMI-Spine Local-Global Muon (val_bpb 1.25425)	FF-GardenFn	#2040
	1.2551	adaLN_recurrence [val_bpb=1.255 on 4 x H100]	dmitriymyan1	#1944
	1.2552	Non-Record : Hybrid XSA-SSM	Jash-Vora	#1524
	1.2552	Non-Record 16 MB Track : Hybrid XSA-SSM	Jash-Vora	#1525
	1.2554	Non-record: wip Random-Basis MLPs + LoRA	camden-git	#1684
	1.2580	Non-record submission: Activation-Space CSA on SP1024 (8xH100)	lucribas	#1904
	1.2600	Non-record: GPT + QAT int8 — 1.2600 BPB (1x RTX 4090, 60min)	CarlosItp	#1930
	1.2604	Add baseline and depth recurrence submissions (1xH100 20min runs)	henrycashe26	#822
	1.2607	Non Record: 4xH100 - val_bpb: 1.26066159, QK5.25 TTT-disabled non-record submission	tenet-diver	#2069
	1.2622	Add non-record JEPA byte-level encoder-decoder submission	gravelBridge	#696
	1.2623	[Non-record] Azure 1xH100 frontier-family engineering run (val_bpb=1.2623)	micoverde	#580
	1.2634	Non-record: LeakyReLU + Sliding Window Eval + Zstd compression	NICOH-YAY	#1390
	1.2635	Add 9L1216 int8 insurance non-record submission	texanfirst	#2053
	1.2635	Add 9L1216 int8 insurance non-record submission	texanfirst	#2061
	1.2639	Add LoRA exploration non-record archive	reyhandl	#1439
	1.2646	Non-record 10min/16MB: BRLM SAGE Chunk4 causal gate (1xH100, val_bpb 1.26461)	FF-GardenFn	#2056
	1.2659	Non-Record: First Viable 3-Loop Recurrence — Birkhoff + Output-LN + Timestep Scaling (val_bpb=1.2659, 14 eff layers from 6 unique blocks)	aazizyan	#855
	1.2663	Non-record: Depth-recurrent 5x3 d768, val_bpb=1.2663	JackYoung27	#30
	1.2663	Non-record: Depth-recurrent 5x3 d768, val_bpb=1.2663	JackYoung27	#31
	1.2670	Non-record: Linear Combiner on Frozen Base — h' = (1+α)·h + M·T_1 + β	organic-intelligence-1976	#2037
	1.2697	Optimized SOTA Submission: 1.2697 bpb	vavo	#46
	1.2699	[Non-Record] JEPA Baseline — LLM-JEPA pretraining — 1.2699 bpb	IshiPareek	#1480
	1.2699	[Non-Record] Modified LLM-JEPA pretraining from scratch — 1.2699 bpbAdd int6 quantization lzma	IshiPareek	#1654
	1.2701	[WIP] add combined optimization, waiting for 8 gpu train	Billy1900	#131
	1.2716	Non-record: Depth Recurrence 5x3 — Weight-Shared Looping Transformer (6xH200, val_bpb=1.2716)	Arth-Singh	#319
	1.2734	Non-record: Diffusion-Noised Teacher AR Hybrid (val_bpb=1.2734, 8xH100)	anthony-maio	#904
	1.2767	non-record:10Layer + BigramHash+ SWA + Attention-Residuals	AtomChen0425	#632
	1.2771	Non-record: 1xH100 warmdown100 30m scaling run	aamodbhatt	#501
	1.2774	Non-record: 1xH100 Budget Run — SmearGate + BigramHash + MLP3x (1.2774 BPB)	tsubasagit	#1463
	1.2781	Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)	sayujshah	#1600
	1.2791	Non-record: trigram phrase-memory ablation on 1×H100: negative result (1.2791 BPB best)	maxwellcipher	#571
	1.2792	Add non-record 16MB SP1024 ShareVLast3 3-seed submission	lkk688	#1768
	1.2824	WSD Cosine Decay Schedule + 10L Int5-MLP BigramHash SmearGate SWA	ShihChunHao	#744
	1.2824	submission/2026-03-25_WSD_CosineDecay_Schedule	ShihChunHao	#791
	1.2826	WIP: LeakyReLU(0.5)² MLP on 11L EMA + GPTQ-lite stack (`track_10min_16mb`)	tejas-goyal	#1051
	1.2827	Non-record: Custom sp4096 BPE Tokenizer (1.2827 BPB on 1×H100)	Nishu2000-hub	#293
	1.2831	Non-record:11L GQA + MLP 3 + Partial RoPE + Int6 Attn/MLP + QAT40	adityasasidhar	#1085
	1.2834	Non-record: GradPower for Muon prefers p<1 in matched H100 ablation	PapaFranku4647	#1682
	1.2838	[Non-record] MLA + SmearGate + BigramHash + SWA — pre-quant 1.2838 bpb	Skrisps26	#354
	1.2880	Non-record: Rank-0 GPTQ + no-TTT CaseOps ablation	Muhtasham	#2015
	1.2882	Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation	anantdgoel	#384
	1.2890	Non-Record: QAT + NTK-4096 Eval + Cosine Warmdown + Aggressive SWA (val_bpb=1.2890, 1xh100)	crony-io	#326
	1.2907	Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB	dnldsz	#969
	1.2907	Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB	dnldsz	#970
	1.2917	Add CTM tail-QAT proxy non-record snapshot	KHUCHAN	#193
	1.2917	Non-record: Lottery Ticket Hypothesis with a few float parameters	Abhishek-Dalvi410	#1753
	1.2919	Non-record submission: Depth-Recurrent U-Net Transformer	Muhammad-Ahmed-Rayyan	#1387
	1.2921	Add non-record no-looping SOTA-stack submission scaffold	gmn0105	#1764
	1.2928	feat: Add non-record dense 2048 sliding-window ablation submission	abhishekrajdhar	#460
	1.2947	Improve baseline with LeakyReLU² activation	JianYan11	#1131
	1.2974	Non-record: UT7 delta-residual + RLMA-rank320 — documented failed direction (val_bpb 1.29740, 2-seed 1xH100)	Sacmaj	#2114
	1.2982	Non-record: hybrid spiking Transformer (SNN)with a multi-step spiking MLP	tsbiosky	#664
	1.2987	Non-record: Warmdown-Tuned Training (val_bpb=1.2987) on 1xRTX 5090	swapp1990	#146
	1.2988	Crystal Curriculum — TF-IDF curriculum learning by Bee Bytez	jamesrziggy	#242
	1.3003	Non-record: HyperparamTuned KV2 + FP16 Embed	xexyz	#271
	1.3004	Non-record: First SSM entry — kill-Mamba-2 + Ternary + n=7 (1.30040)	potatonyliu	#1994
	1.3029	Add non-record 1xH100 budget seq2048 run (val_bpb 1.3029)	Aniket-pd	#1261
	1.3036	RECORD: Denseformer+VRL+XSA on last 4 layers+Gradient Clipping (pending 8xH100 eval)	grim-hitman0XX	#862
	1.3036	Non-record: LeakyReLU² + BigramHash + Int5/Int6 + SlidingWindow — val_bpb 1.3036 (1×H100)	Syed-M-Zeeshan	#1027
	1.3038	Add non-record submission for 15L ternary MLP-only 20k	shasank0001	#1347
	1.3039	Mixed INT5/INT6 QAT from step 1 (1.3039 bpb)	BruhTheMomentum	#1417
	1.3043	Non-record: Wider-shallower 4x768 + QAT (1xH100, 1.3043 bpb)	dttdrv	#185
	1.3055	Non-record: Data ordering & selection — negative result on FineWeb	abaybektursun	#772
	1.3066	non record submission: 11 l + int6 + tuned LR + fp16 embed (1.3066 bpb local)	skamalad	#1940
	1.3069	Trunghiu	Hieuabssy	#1136
	1.3069	leakyRelu0.5^2 + GPTQ + EMA + BigramHash(1.3069)	Hieuabssy	#1137
	1.3069	[Non-Record] 5L MLP×4 EMA=0.97 Optuna — GH200 proxy, val_bpb=1.3069 (int6+zlib)	Okropniak	#1174
	1.3081	Add non-record v1 1xH100 LeakyReLU GPTQ-lite submission	hypnoastic	#1444
	1.3092	Submission Record Series: BatchOpt+MLP4+RoPE100k and 8L EMA Int6 Bigram65k on Single 20GB GPU (val_bpb 1.7810 → 1.3092)	markste-in	#759
	1.3095	## Non-Record: Int6 QAT + Sliding Window Eval — 1.310 BPB (DGX Spark)	AlirezaAlampour	#1845
	1.3110	Add SkipQuant Adapter TTT (int4 skip_gates/weights + 4-epoch TTT)	Hetul803	#1844
	1.3132	Non-record: BigramHash uppercase enrichment (1xH100, val_bpb 1.3132)	FelixFuhs	#2021
	1.3151	Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151)	nirmathur	#1383
	1.3162	Non-record: FP16 embed + MLP992 sliding-window size-repair probe	THUQiXuan	#497
	1.3163	Improve contest submission	WesSwogger	#1912
	1.3178	Submit 2026-03-27_PhaseCoherenceGatedGradients	jzgdev	#949
	1.3178	2026-03-27_PhaseCoherenceGatedGradients submission	jzgdev	#950
	1.3178	submission 2026-03-27_PhaseCoherenceGatedGradients PIC-GID + ParallelMuon	jzgdev	#984
	1.3189	Non-record: H100 SXM SP1024 baseline probe	icpmacdo	#1810
	1.3193	Log MPO tensor train baseline at r=16 (1.3193 BPB)	chinmaypatwardhan-ops	#1078
	1.3208	Add non-record 1xH200 fp16-embed baseline sweep submission	itu-itis24-buyukhelvacigilm24	#407
	1.3220	Frozen Random Backbone + LoRA Adapters (1.322 BPB)	dljr-github	#1548
	1.3220	Non-record: Frozen Random Backbone + Rank-304 LoRA Adapters (val_bpb 1.3220)	dljr-github	#1549
	1.3223	Non-record: 30 experiments across 13 architectures (MLA, Pause Tokens, Eigenweight, 9 exotic ideas)	nnm2602	#1589
	1.3246	Evolutionary NAS on only a 5 year old MacBook; within 10% of baseline	mike-ferguson	#1627
	1.3250	Non-record: MC Dropout ensembling is negative for small LMs	abaybektursun	#1021
	1.3262	Non-record: Ternary MLP Quantization — Void Fraction (val_bpb 1.3262, 10.9MB)	G3sparky	#1733
	1.3267	Record: 11L Int6 QAT + Warmdown (val_bpb=1.3267, 1xH100)	pkim02	#488
	1.3274	Add baseline H100 training report and process docs	xuafeng	#292
	1.3276	[codex] Validate sliding-window post-quant evaluation on 1xH100 proxy	Kevxn97	#260
	1.3281	Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb)	NishantDahal	#73
	1.3286	PTQ int6-attn + int5-mlp, 20L×256d, mlp=5 — val_bpb 1.3286	PavelPaha	#1543
	1.3288	Non-record: Hyperbolic Q/K Lite 1xH100 exploration package	ldh-at	#1074
	1.3299	Non-record: JEPA-LM Latent Predictive World Model (first JEPA submission)	adi-suresh01	#1312
	1.3302	Non-record: Ablation Study — Untied Embeddings (val_bpb 1.3302)	edidiongumoh	#2000
	1.3319	Non-Record: 10L Spatio-Temporal SNN (BPB: 1.3319)	ieuko	#1367
	1.3321	Add Compiled LeakyReLU2 + Slide64 Eval non-record submission	SHN2004	#1063
	1.3323	Add Hybrid Depth-Recurrent Transformer submission	tobiascanavesi	#341
	1.3342	Record: Depth-Recurrent UT + Rank-1 LoRA Per-Iteration Adaptation — val_bpb 1.3342	vimeto	#1096
	1.3346	Muon Optimizer Tuning: val_bpb 1.3346 by jeremyschied	jeremyschied	#794
	1.3355	Parameter golf jepa	danielxmed	#1097
	1.3358	Non-record: Stacked hyperparameter tuning + eval2048 (RTX 5090, val_bpb 1.336)	gwelinder	#104
	1.3365	Non-record: 10L MLP3x + Muon \| val_bpb=1.3365 \| Single Colab GPU	Durlabhkumarjha	#1190
	1.3379	Causal Oscillator LM: physics-native architecture (BPB 1.34)	rolandnsharp	#1061
	1.3380	Non-record: 5L MLP4x + SlidingWindow + SWA + QAT — val_bpb 1.33 (1xH100)	JUSTSUJAY	#842
	1.3428	Non-record: MDLM Masked Diffusion + Depth Recurrence — val_bpb 1.3428 (8×H100, seed=1337)	He-Wenhao	#1582
	1.3430	Non-record: WSD LR schedule on naive baseline (1×H100)	Gitcatmeoww	#2094
	1.3434	(Non record) 11L Frontier MixedQuant Trigram	armmer016	#570
	1.3440	Non-record: Sliding-Window Evaluation + Int8-Zlib Compression (1.34 bpb)	swetapaul08	#1133
	1.3440	Non-record: ALBERT-Style Low-Rank Embedding Factorisation (ablation study, 1×H100)	Cayton-Tech	#1481
	1.3441	EBLS Learned Sharing (10min/16MB)	Robby955	#433
	1.3446	Submission: Low-Rank All-Attention (1.3446 bpb)	CRouvroy	#226
	1.3458	Non-Record: Replace Muon optimizer with NorMuon for baseline (1xH100)	stevenshinechen	#438
	1.3477	[Non-record] MHALM V2 non-record submission (1.3477 bpb)	aquemy	#2145
	1.3479	Non-record submission: baseline_sp1024, val_bpb=1.3479(on single H100), AbhiShet108	AbhiShet108	#1713
	1.3485	Non-record: MDLM Masked Diffusion (1.3485 BPB)	Rhoahndur	#1403
	1.3486	Non-record: Warmdown fix (9x512) on 1xH100 10m	aamodbhatt	#94
	1.3496	Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)	hardik-bhadani-git	#1443
	1.3509	Add Parameter Golf submission: Depth12 Dim416 KV4	AntDX316	#71
	1.3510	Add non-record local A100 TTT eval-stride0 submission	DanishjeetSingh	#285
	1.3515	Grant nonrecord tied blocks	Jaksenc	#717
	1.3517	Add MPK 8x384 10-minute submission record	DJLougen	#144
	1.3522	Sophonics LoRA Int5 DeepMLP non-record submission	DesperateTomatoCultivator	#1951
	1.3525	Attention Warm-Start: Initializing Q/K from Bigram Co-occurrence SVD	SPThole	#678
	1.3527	Non-record: 21L PRP experiment	maksblu	#1235
	1.3529	Add local baseline reproduction record	bjbjbjbjbjbj	#346
	1.3538	[non-record] 1xH100 screening: compression + eval strategy	numb3r33	#938
	1.3540	Add 128-cluster baseline submission files	danielweidinger2299-debug	#985
	1.3551	[Non-record submission] NativeBonsaiBinary 1 bit non-record submission (1.3551 BPB)	kineticforge	#2048
	1.3556	Seq2048 + torch.compile + mid LR (1xA100 draft)	C0neF	#746
	1.3557	Add non-record recurrent 0011+g2 R768 submission	raider99k	#1203
	1.3557	[Non Record] Online Curriculum Learning	SPThole	#737
	1.3560	Non-record: Fused Triton Megakernels — RMSNorm + LeakyReLU² (val_bpb 1.3560)	dentity007	#1192
	1.3565	Parallel-Residual+SwiGLU+11layer	Pravin-dev06	#1751
	1.3569	MirrorLoop HRC + LexLoRE non-record submission	corbensorenson	#2004
	1.3571	Non-record: BESE + Mamba-3 SSD Hybrid (1.3571 BPB, 7.6 MB artifact)	mrbese	#1665
	1.3572	Add PartialRoPE 16/64 experiment records	inFaaa	#1144
	1.3576	[Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100	abbudjoe	#1569
	1.3579	non-record: MASA low-rank shared attention + SwiGLU, 1.3579 BPB	Zagot-byte	#1025
	1.3587	Non-record: H-Net Dynamic Chunking — Learned Tokenization Layer (val_bpb 1.3587)	dentity007	#1191
	1.3587	[Non-Record] SSM8: Fat State Mamba SSM, BPB=1.3587	KRGulaj	#1574
	1.3595	[Non-record] 1-Stage Byte-level H-Net at 17.5M: Dynamic Chunking Learns Word Boundaries (39x-91x fewer params than H-Net paper)	DariusFeher	#1104
	1.3600	Submission/2026 03 28 masked diffusion	ikermoel	#1053
	1.3620	submission: LeakyReLU² + EMA + BigramHash(20480) + MLP3.5x	aptsalt	#941
	1.3629	Non-record: LapushBaby stock baseline 1xGPU RunPod	LapushBaby	#630
	1.3631	[Non-Record] QAT Dead-Code Analysis + 7 Novel Technique Sweep (1xH100)	wfproc	#1032
	1.3639	Notable Non-Record: H-Net Dynamic Chunking — 1.3639 BPB — Learned Content-Dependent Segmentation	gowtham0992	#1168
	1.3641	Non-record 10min/16MB: BitNet-1.58 Ternary HydraMLP (1xH100, val_bpb 1.36407)	FF-GardenFn	#2042
	1.3660	Non-record: 1.366 BPB Baseline (SmearGate + Muon, int6, zstd)	nitSubedi	#567
	1.3680	Non-Record: Full-Model Depth Recurrence Ablation — 7 configs, torch.compile penalty = 0	codeprakhar25	#1449
	1.3684	Add 11L 448x2 PairHash int8+zstd 10-minute submission record	FyeJordy	#749
	1.3693	Add shared-block recurrent 10-minute non-record 16MB submission	LocalX991	#1349
	1.3693	Non-record: Compact 12x384 1xH100 10m	aamodbhatt	#93
	1.3705	Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB	gowtham0992	#1113
	1.3736	Submission: val_bpb=1.3736 \| 10 layers + Muon + mlp_mult=3Update default NUM_LAYERS and MLP_MULT values	Durlabhkumarjha	#1167
	1.3762	Non-record: LeakyReLU(0.5)^2 + TrigramHash on PR414 stack (1.3762 bpb, 1xA100)	IshiPareek	#882
	1.3771	[Non-record] Tokenformer (Pattention): 40% smaller artifact at matched params — first	alexdwu13	#2029
	1.3779	non-record-16mb: SpikingMLP + GRU readout (T=4, H_GRU=64)	Gary0302	#2064
	1.3797	Add non-record 16MB layers7 submission	akshai0296	#125
	1.3825	Add non-record submission: 8xH100 FineWeb baseline + TTT eval (val_bpb 1.3825)	sicauzxl	#196
	1.3825	Hierarchical Quantized Embedding (HQE): Zipf-law mixed precision	anjing00monyet-arch	#1821
	1.3827	Record: Gated Residual Scaling (Token-wise) for Attention + MLP - 1.3827 BPB	souro26	#1671
	1.3855	Baseline submission (~1.385 BPB)	abhinava-sai	#1989
	1.3868	Record Submission: Poly5 Softcap + Z-Loss + YaRN + Zstd-22 + Stride-16 (on PR #549 stack)	monisha-max	#1325
	1.3874	11L INT7 + MuonWD + SWA (preliminary)	jorge-asenjo	#1258
	1.3900	Record: Doc-Isolated TTT + Eval Optimizations	vivekvar-dl	#964
	1.3915	Draft non-record: qk525 nopqttt clip1500 BQ11 legal score-first TTT, pending 8×H100 multi-seed validation	JiaJunDeng5930	#1816
	1.3921	Non-record: 1x H100 SXM5 Explorations	User123331	#1608
	1.3932	Non-record: Mixture of Softmax K=2 R=64 (1xH100, 10min, 1.3932 bpb)	User123331	#266
	1.3967	Non-record: CustomToken2050 Compact Core 416d, 1.3967 BPB two-day proof of direction	Carter2565	#2113
	1.3969	Non-Record v2: 7L UNet + Int8 QAT + EMA + Long Train — 1.3969 BPB (DGX Spark)	AlirezaAlampour	#1606
	1.3971	Non-Record: CAT, Sparsity (Structured and Hessian-Guided), MoE, KAN Negative Results	pireylow	#1537
	1.3978	Non-record 10min/16MB: Future-State Encoder Planner (1xH100, val_bpb 1.39784)	FF-GardenFn	#2044
	1.3999	Record: LeakyReLU² + XSA4 + LN Scale + Partial RoPE — val_bpb 1.3999	Programmerryoki	#827
	1.4011	Non-record: Kitchen Sink — ACT vs Masked Recurrence, 8192-Bigram (val_bpb 1.4011)	aiejvn	#1820
	1.4016	Submission: aria-redefine-qbit - Hybrid Recurrent U-Net (1.40 BPB	redefine-qbit	#1577
	1.4054	[Non-Record] H-Net with Dynamic Sequence Chunking	TimS-ml	#992
	1.4061	Depth-recurrent transformer: shared block × 12 passes, val_bpb 1.4061, 4.39MB artifact	Sambhav242005	#386
	1.4072	Hybrid INL + Sort-Split MoE (1.41/1.46 bpb TTT, 15.5MB, 1xH100)	Complexity-ML	#377
	1.4078	Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record)	Shuvam-Banerji-Seal	#527
	1.4078	Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v2]	Shuvam-Banerji-Seal	#707
	1.4078	Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v3]	Shuvam-Banerji-Seal	#712
	1.4096	Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer	zlxi02	#830
	1.4100	Attempt/qk gain 5.5 deeper recurrence	Vickyrrrrrr	#1616
	1.4106	Add non-record local A100 PR60-stack reproduction	DanishjeetSingh	#284
	1.4120	[WIP][non-record] 8L/448 width branch local results	andyluo22	#588
	1.4131	Non-record: NN + byte-level PPM adaptive-λ mixture demonstration	OE-GOD	#1782
	1.4135	Random Projection + Low-Rank Adapter MLPs (Non-Record / Requested PR)Submit Random Projection + LoRA adapter (non-record track)	Ammar12Falah	#2030
	1.4182	Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB	randy06122001-boop	#997
	1.4192	Non-record: Autoresearch-Guided Optimization — 100+ Experiments + Negative Results	Park-Tae-Hwan	#1418
	1.4222	non-record 16MB A100 SXM run (10L mixed int5/int6 + EMA + QAT)	zeal175	#619
	1.4233	Non-record: Fibonacci Manifold Network + Hybrid Attention + Sparse Braid	Jaredcastorena	#1650
	1.4239	Add non-record 4090 warmdown submission	SHN2004	#716
	1.4242	BSM (Bounded State Manifold) - A box intersection non-transformer architecture, 1.4242 val BPB	dheeren-tejani	#1067
	1.4245	Non-record: QAT + Neural Cache + LoRA TTT	Bortlesboat	#304
	1.4315	Add Kshitij submission (1x H100, val_bpb 1.4315, env-based config)	singhaikshitijjain	#994
	1.4352	Non-Record: JEPA-NTP Auxiliary Losses (Negative Result)	sidhanth97	#1556
	1.4370	Record: 11L MLP3x + SmearGate + Error Correction Table	kellyvv	#108
	1.4370	Record: 11L MLP3x + SmearGate + Error Correction Table	kellyvv	#232
	1.4390	Non-record: Universal Transformer + Adaptive Density (val_bpb 1.4390)	dentity007	#1193
	1.4442	Non-record: TTT-disabled SP1024 GPT control (1xH100, val_bpb 1.4442)	tenet-diver	#2052
	1.4444	Record: 10-Layer 4xMLP (val_bpb: 1.4444)	hmhm0	#228
	1.4447	Notable Non-Record: JEPA — 1.4447 BPB — Joint Embedding Predictive Architecture for LLMs	gowtham0992	#1116
	1.4457	[Non-Record Submission] CompressedUT CE + EMA Export + Export-Aligned Late QAT (1.4457 BPB)	mihir-s-05	#937
	1.4465	Non-record: Compressor-Aware Training (CAT), differentiable compression proxies for LZ-family compressors	korentomas	#1385
	1.4479	Non-record: PROTEUS Feature Ablation - Parallel Residuals + Mixed INT5/INT6 + TTT on DGX Spark GB10	dentity007	#1425
	1.4500	[Architectural Proof-of-Concept] Saliency-Boosted GPTQ & High-Entropy Routing	Subramanyam6	#1558
	1.4508	Non-record: LeakyReLU(0.9)² slope sweep (local validation, compute pending)	yaowubarbara	#1062
	1.4525	Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) with ablations	anantdgoel	#413
	1.4530	Non-record submission: MLP3x 9L 512d, 1.4530 bpb (1xRTX 4090)	ivanontech	#854
	1.4536	[Non-Record] MLP3x + WD0.04 + OrthoInit + Sliding Eval — 1.4536 BPB	AymanMahfuz27	#444
	1.4537	Add V22 Int6 fast-converging 16MB model (~8min on RTX 4090)	mini-sarami	#1531
	1.4564	[Non record] Mercury in Retrograde - text diffusion model	simon-marcus	#1778
	1.4574	[Non-record] MHALM v1 (1.4574 bpb)	aquemy	#476
	1.4584	Notable Non-Record: Text Diffusion (MDLM) — 1.4584 BPB — Masked Diffusion Language Model	gowtham0992	#1119
	1.4612	Non-record: 11L + XSA4 H100 frontier (1.4612 BPB legal)	chrisnkuno	#554
	1.4617	[Non-record]: JEPA v2 —JEPA v2 — Why same-sequence next-k JEPA collapses in causal LMs	luciobaiocchi	#1330
	1.4689	Non-record: Depth Recurrence Sweep — Systematic Layer Loop Ablation	krishs0404	#1726
	1.4702	experiments: MODEL_DIM=256, MLP_MULT=3, WARMDOWN fix - best bpb 1.4702	0xtigerclaw	#618
	1.4707	[Non-record] TTT-E2E: Meta-learned test-time training via FOMAML	abaybektursun	#1222
	1.4709	Non-record: Olmo Hybrid (GDN + Attention) for long-context training — 8k/16k/32k crossover study	aarjunsrinivasan	#1371
	1.4716	[Submission] SwiGLU MLP (under 16MB)	Abhinav-Avasarala	#1391
	1.4716	[Submission] SwiGLU MLP (under 16MB)	Abhinav-Avasarala	#1393
	1.4750	11L AttnRes + Gated Attention + Looped Blocks + EMA + Cosine + QAT	Neopolita	#607
	1.4765	Non-record: Nemotron-H Mamba-3 Hybrid + First SSM Depth Recurrence (1.4765 BPB)	inin-zou	#1607
	1.4775	Non-record: BigramHash(4096) + Cosine EMA + LZMA-9	Alfaxad	#681
	1.4784	First submission	markste-in	#408
	1.4816	Submission/mamba ssm byte260	nicholasbailey87	#1342
	1.4831	Non-record: 19.2M MDLM Text Diffusion: fp8 e4m3 + EMA 0.999 + Muon LR 0.02	lsb	#1699
	1.4841	Non-record: 28 Experiments in 5 Days — What Works, What Fails, and Why Small-Scale Tests Lie	himanshudongre	#1227
	1.4893	Non-record: Sliding Patch Attentions + MoE (2-layer compact run)	BurguerJohn	#981
	1.4942	Add TRN hybrid non-record submission (1.4942 bpb, 1x RTX 5090)	amabito	#669
	1.4963	Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb)	j420	#530
	1.5000	Non-record: Looped Transformer + LoRA + Skip Connections + NorMuon + SWA + Int6 + Sliding Window	MatthewHRockwell	#103
	1.5000	T5: Phase-Based Depth Recurrence + MLA + Graduated Precision (Non-Record)	Jonas-T5	#739
	1.5042	Non-record: Trinity Ternary CPU v3 — Apple M1 Pro 72h, val_bpb 1.5042	deborahnelson8788726	#1866
	1.5080	Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508	akaiHuang	#1183
	1.5080	Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence	akaiHuang	#1255
	1.5096	Non-record: MLX tuned hyperparameters — 1.5096 BPB local (H100 pending)	seekerPrice	#1612
	1.5096	Non-record: CUDA port of PR #1612 recipe (H100 pending)	seekerPrice	#1614
	1.5134	Review: Rerun of #972 with actual full-vocab normalization	AnirudhRahul	#978
	1.5140	Non-record: Family 1A tied blocks (1xH100 dev snapshot)	jaksenc	#536
	1.5164	[Non-record] Quantization Findings: SWA Reversal + Int5 Failure	kellyvv	#238
	1.5194	[Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB	aiejvn	#1757
	1.5200	Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5200)	frido22	#1762
	1.5207	Non-record: 131 Systematic Experiments — 1.5207 BPB on RTX 4000 Ada	ranausmanai	#1434
	1.5248	Non-record: 1xH100 auto-research int6 policy sweep	aamodbhatt	#502
	1.5252	Submit 1x A100 QAT Fix - 1.5252 BPB (Non-Record) [v4]	Shuvam-Banerji-Seal	#719
	1.5252	Submit 1x A100 QAT Fix - 1.5252 BPB (Non-Record) [v5]	Shuvam-Banerji-Seal	#725
	1.5252	Single A100 QAT Performance Fix (fresh review cycle)	Shuvam-Banerji-Seal	#751
	1.5275	Non-record: TrigramHash — iso-parametric bigram(96)+trigram(32), val_bpb=1.5275 (1xH100)	fleeb83	#504
	1.5283	RQZ-Golf v1: Depth recurrence for parameter efficiency	TheCause	#54
	1.5295	Add non-record 1x5090 autoresearch submission with two-campaign analysis	jadechip	#432
	1.5348	Non-record: TernaryRecurrentGPT - ternary 1.58-bit MLP + depth recurrence (1xL4 val_bpb=1.5348)	Parswanadh	#559
	1.5363	Submission: Recursive Layer Sharing \| 13.9 MB \| 1.53 BPB	negrurv	#1542
	1.5364	Applied Async Prefetching Boost Performance of Any Approach	SirSaltySalmon	#785
	1.5382	Non-record: TTT + QAT on Consumer GPU (val_bpb=1.5382)	Dannybc123	#263
	1.5390	[Notable Non-Record Submission] Everything Everywhere All in One Bit: XNOR-mally I'd use floats - 118M XNOR-Net - 1.539 BPB - 10-Min and Unconstrained Runs	CiprianFlorin-Ifrim	#1388
	1.5406	[Non-Record] Geodesic Topological Tokenizer + BigramHash + BRN	skar07	#1571
	1.5516	Non-record: 1x RTX 3090 baseline run (sp1024, 1 shard)	meett07	#405
	1.5546	[Non-Record] 26.5M Int6 QAT + EMA (Pending Compute)	DevWizard-Vandan	#1436
	1.5568	Non-record: Blueprint Stack + ProgSeq + Multi-scale RoPE + ByteEmbed — val_bpb 1.5568 (1xRTX 3080)	Blakethefn	#1411
	1.5633	Mamba-3 SSD + Attention Hybrid with QAT (1.5633 bpb)	mradassaad	#1107
	1.5645	[record] LR warmdown v1 (WARMDOWN_ITERS=900) with confirmed 10-min runs	intelligentiaomni	#1163
	1.5672	Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5672)	frido22	#643
	1.5782	[Non-Record] Procedural Scaffolds for RMLA (1.5782 BPB RTX4090)	sunburnt716	#1793
	1.5879	submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64)	outsourc-e	#259
	1.5890	Depth recurrence: 3 unique layers x 3 loops, 1.589 BPB	koushikkethamakka	#91
	1.5918	[Non-Record] Whirlpool v5b — Non-Euclidean Lorentzian Attention on the Hyperboloid Manifold	tmancino	#1239
	1.5990	Swiglu gating_ QAT_Residual Attention Scalin _EMA- Sliding window_Optimizations for 10min/16MB Track	visin109	#2159
	1.5992	Add non-record 16MB submission: Hybrid Sparse Diffusion 2H on 8xH100	ymrohit	#1198
	1.6004	Non-record submission: recurrent 512 L3 6k (8x H100, 224s)	estesryan	#213
	1.6070	Non-record: Random Linear Maps + Learned Adapters (val_bpb=1.607, 1.92MB artifact)	fielding	#874
	1.6110	Non-record: Faithful Conditional Memory	wisebreadloaf	#1490
	1.6114	Non-record: local RTX 4070 SP1024 8x512 KV4 seq768 500-step run	riatzukiza	#247
	1.6130	Radial bitnet submission	rthgit	#435
	1.6200	N-gram logit boost + HedgeMixer + score-first TTT	haimianbaobao007	#1014
	1.6231	Non-record: local RTX 4070 SP1024 8x512 KV4 500-step run	riatzukiza	#248
	1.6252	[non-record] Masked Diffusion Language Model (val_var_bpb=1.625)	mtybadger	#820
	1.6323	SP8192 + Depth Recurrence + Parallel Residuals (14.09MB)	dippatel1994	#1499
	1.6371	records: add 2026-03-27 TrigramHash run, harden quantization safety, and clean docs	Mister2005	#1201
	1.6372	Non-record: Muon-Aware QAT + LAWA + Adaptive LR Scheduling (7 toggleable improvements)	mohosy	#130
	1.6429	Non-Record: Add Novel SemanticEngine SSM submission	KenMalloy	#2122
	1.6507	Add non-record submission: faithful KV-cache quantization backends on 1x RTX 3090	LucasErcolano	#1149
	1.6507	Non-record: Triton KV-cache backend for autoregressive eval	LucasErcolano	#1153
	1.6542	Non-record: Random Linear Map Adapter Projections — 1.21MB artifact (val_bpb=1.6542)	anthony-maio	#974
	1.6572	Non-record: local RTX 4070 SP1024 7x512 KV4 500-step run	riatzukiza	#258
	1.6577	Non-record: local RTX 4070 shared-depth RMS interface v0	riatzukiza	#276
	1.6644	Submit Lim Shiaw Yong: 1.66 BPB 12MB Squeeze Architecture	shiawyonglim	#1620
	1.6656	Non-Record: U-Net Transformer + Int8 QAT + LeakyReLU² + Muon — 1.6656 BPB (DGX Spark)	AlirezaAlampour	#1484
	1.6656	Non-Record: U-Net Transformer + Int8 QAT + LeakyReLU² + Muon — 1.6656 BPB (DGX Spark)	AlirezaAlampour	#1486
	1.6660	Non-record: local RTX 4070 SP1024 7x512 KV2 500-step run	riatzukiza	#240
	1.6768	Varun - 1st Submission	Mister2005	#1200
	1.6853	Non-record: Hopper regenerated adapters + B2B kernel prototype — documented negative result	Sacmaj	#2115
	1.6924	Non-record: Faithful mHC-lite	wisebreadloaf	#1491
	1.7080	Train gpt 0427 - 1.708bpb	lijuncheng16	#1867
	1.7195	Non-record: knowledge distillation teacher-student submission	Jeneesh1014	#1034
	1.7232	non-record: LR warmdown on 1x A40 (1.723 bpb, 8.40MB)	my-sonicase	#313
	1.7270	Non-record: GPTQ-lite Scale Clamp Fix + 6-bit Packing + Depth Recurrence on Stack B	Rome-1	#1389
	1.7510	Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.78 MB)	Athenox14	#126
	1.7622	Non-record: JEPA Hybrid — first latent-prediction LM (1.7622 BPB, 7.5MB)	butbutt42	#1685
	1.7757	Non-record: Polar STE QAT for structural weights	LucasErcolano	#1154
	1.7942	Non-record: Connectome-JEPA — Sparse I/O Bottleneck (val_bpb 1.7942 ± 0.003, 1.97MB artifact, first JEPA submission)	ericdatum	#1152
	1.8010	Non-Record Submission: Random Subspace Optimization	yangguohao	#1781
	1.8111	Add classical doc-copy 16.3M lzma submission	Muhtasham	#902
	1.8184	Non-record: 1.8184 BPB Single-step Recurrent Transformer with Q-LoRA (Windows 3090)	Ribin545	#1299
	1.8184	Non-record: 1.8184 BPB Single-step Recurrent Transformer with Q-LoRA (Windows 3090)	Ribin545	#1300
	1.8338	Non-record: PR315 repro on 1xH100 PCIe, int6+zstd (val_bpb=1.8338)	sjp611	#356
	1.8389	Add 10L 4K long-context negative-result submission	takoyakisoft	#237
	1.8440	Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)	cschubiner	#56
	1.8480	[WIP] SSM LRU Baseline — First State Space Model Submission	timothywangdev	#220
	1.8522	Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala)	anandks2006	#345
	1.8587	Non-record: Prefix-Conditioned Suffix Diffusion — True Discrete Diffusion (diffusion_pll_bpb=1.8587)	anthony-maio	#905
	1.8658	Non-record JEPA submission: VRS (Void Rescue System)	ikermoel	#1513
	1.8698	Depth Recurrence: 3x3x1024 (non-record, pending H100)	Marvbuster	#79
	1.8838	[Non-record] H-Net dynamic chunking — Path B (attention-only, byte-level)	Azura-Whiston	#1842
	1.8989	H-Net: First Learned Byte-Level Tokenization (README Wishlist) -- 1.90 BPB, 22M params	greqone	#1044
	1.8990	Non-record: Skill Forge — Autonomous ML Experimentation System (Local RTX 4070)	FlynnCruse	#645