← Back to Quantization

QAT

Quantization

Used in

139 PRs

Best BPB

0.1653

Avg BPB

1.1947

Submissions

PR #30by JackYoung27

PR #79by Marvbuster

PR #117by trovatochris

PR #126by Athenox14

PR #130by mohosy

PR #145by mrdavtan

PR #160by ChaseWNorton

PR #185by dttdrv

PR #193by KHUCHAN

PR #218by bopmite

PR #237by takoyakisoft

PR #263by Dannybc123

PR #286by chris-buckley

PR #292by xuafeng

PR #292by xuafeng

PR #316by SkywardSyntax

PR #345by anandks2006

PR #356by sjp611

PR #375by charmquark1984

PR #397by translatingthename

PR #400by chanwoo-park-official

PR #410by EthanYangTW

PR #414by signalrush

PR #415by EthanYangTW

PR #415by EthanYangTW

PR #429by AbhisekBasu1

PR #432by jadechip

PR #457by carlesonielfa

PR #469by cmcdnd

PR #478by gowtham0992

PR #483by tmustier

PR #503by EthanYangTW

PR #508by newjordan

PR #527by Shuvam-Banerji-Seal

PR #528by EthanYangTW

PR #529by EthanYangTW

PR #534by rarce

PR #545by EthanYangTW

PR #589by RoyiRa

PR #607by Neopolita

PR #617by ryanadamsai

PR #674by newjordan

PR #680by bro4all

PR #681by Alfaxad

PR #692by EthanYangTW

PR #707by Shuvam-Banerji-Seal

PR #712by Shuvam-Banerji-Seal

PR #714by Upsalla

PR #719by Shuvam-Banerji-Seal

PR #725by Shuvam-Banerji-Seal

PR #730by janwww

PR #751by Shuvam-Banerji-Seal

PR #785by SirSaltySalmon

PR #790by danialht

PR #836by autocode-rayes

PR #852by Prush69

PR #872by gowtham0992

PR #882by IshiPareek

PR #885by lolrazh

PR #890by sofiabod

PR #891by robbiebusinessacc

PR #903by CiprianFlorin-Ifrim

PR #918by haikosys

PR #918by haikosys

PR #918by haikosys

PR #920by CiprianFlorin-Ifrim

PR #923by CiprianFlorin-Ifrim

PR #928by autocode-rayes

PR #952by FlashyFlash3011

PR #972by Idan3011

PR #975by Abhishek8108

PR #989by alexanderaperry-arch

PR #996by Idan3011

PR #1002by SoHarshh

PR #1007by dillon-blake

PR #1009by SoHarshh

PR #1037by TimPietruskyRunPod

PR #1040by JoeProAI

PR #1072by vimeto

PR #1081by michaelwinczuk

PR #1088by serdardoesml

PR #1096by vimeto

PR #1096by vimeto

PR #1107by mradassaad

PR #1108by DbBested

PR #1111by MichaelMcCulloch

PR #1140by newjordan

PR #1172by dexhunter

PR #1179by dexhunter

PR #1200by Mister2005

PR #1201by Mister2005

PR #1227by himanshudongre

PR #1246by deborahnelson8788726

PR #1254by Elarwei001

PR #1256by oidebrett

PR #1260by dexhunter

PR #1270by VirajDeshwal

PR #1273by DushyantChetiwal

PR #1275by ranausmanai

PR #1289by MatoTeziTanka

PR #1302by vlivashkin

PR #1305by DariusFeher

PR #1318by renqianluo

PR #1346by shasank0001

PR #1347by shasank0001

PR #1357by mollahasani

PR #1366by yunoshev

PR #1417by BruhTheMomentum

PR #1418by Park-Tae-Hwan

PR #1431by Idan3011

PR #1433by mtybadger

PR #1436by DevWizard-Vandan

PR #1574by KRGulaj

PR #1579by Tonyy1977

PR #1602by SPThole

PR #1620by shiawyonglim

PR #1627by mike-ferguson

PR #1630by KevinChunye

PR #1683by yunoshev

PR #1683by yunoshev

PR #1718by himanshudongre

PR #1757by aiejvn

PR #1760by BrandtChristian

PR #1773by Amanbig

PR #1788by marinabar

PR #1789by SlavH

PR #1805by yevh

PR #1817by Tonyy1977

PR #1831by Christopher-Lee-McClendon

PR #1868by Christopher-Lee-McClendon

PR #1891by peytontolbert

PR #1963by someone114514

PR #1984by unowenmaxwen

PR #2046by nprime06

PR #2086by deniskurlov

PR #2089by AlirezaAlampour

PR #2092by dmpta

PR #2133by codemath3000

PR #2159by visin109

Hyperparameters Across PRs

pr_number	bits	scope
30	—	post-quantization gap
79	6	all
117	—	weights
126	2	all weights
130	8	large matrices (>65K params)
145	8	per-row weights
160	—	submission artifact / timed run support, but not activated before stop
185	8	model weights
193	8	export-matched int8 path
218	—	all
237	—	all
263	8	weights during training
286	—	final phase only
292	5	MLP layers
292	6	attention layers
316	7	all
345	8	all
356	6	all
375	4	full-run
397	6	all
400	6	mlp, attn
410	6	attention; int5 for MLP layers
414	6	model weights
415	6	attention
415	5	MLP
429	—	all
432	—	attention
457	8	all
469	5	all
478	6	all
483	6	all
503	6	all
508	6	weights
527	6	—
528	6	all
529	6	all
534	—	late training
545	5	all weights
589	6	all
607	8	per-row
617	—	all
674	6	all
680	6	all
681	6	all
692	6	all weights
707	6	all
712	6	all
714	6	all
719	6	all
725	6	all
730	8	fp_params / model artifact
751	—	all
785	—	late QAT
790	—	mixed int6 GPTQ with early QAT
836	6	all
852	4	all
872	—	all
882	—	all
885	—	all
890	—	all
891	6	int6
903	4	large weights
918	2	MLP up
918	3	attn/MLP down
918	4	embeddings
920	8	FP8 path / model artifact
923	—	all
928	6	all
952	6	all
972	6	all
975	6	all
989	6	all
996	6	all
1002	4	MLP and bigram; INT6 attention
1007	—	all
1009	4	MLP + bigram
1037	—	all
1040	5	all
1072	—	all
1081	—	all
1088	8	model weights
1096	6	shared block weights
1096	6	shared weights
1107	6	Mamba projections and standard CastedLinear layers
1108	6	all
1111	6	all
1140	8	model
1172	—	late
1179	—	all
1200	6	Q/K/V/O and MLP up/down bank slices
1201	6	bank slices (Q/K/V/O and MLP up/down)
1227	5	all
1246	—	all large weight matrices
1254	6	all
1256	6	all
1260	—	all
1270	—	model
1273	1.58	all
1275	8	all
1289	—	model weights
1302	6	all
1305	—	all
1318	—	all
1346	—	MLP
1347	—	MLP
1357	6	all
1366	—	MLP int5, attention int6
1417	—	all weights
1418	4	model weights
1431	5	MLP layers
1433	—	codebook weights
1436	6	all
1574	6	weights
1579	6	large weight matrices
1602	—	all
1620	6	all
1627	8	training export
1630	—	all
1683	4	MLP
1683	5	attention
1718	6	matrices only
1757	6	all
1760	6	all layers
1773	—	MLP/attention/embeddings
1788	6	all
1789	—	weights
1805	6	large 2D linear matrices
1817	6	training
1831	5	model weights
1868	7	embeddings
1891	—	BitNet-compatible model
1963	—	model weights
1984	—	model
2046	—	FP8 MLP
2086	8	non-quantized linears
2089	8	per-row weights and per-tensor vectors/scalars
2092	8	CastedLinear forward pass
2133	—	TTT quantized phased LoRA
2159	8	all linear weights