← Back to Quantization

STE QAT

Quantization

Used in

117 PRs

Best BPB

0.1653

Avg BPB

1.2044

Submissions

PR #37by khasinski

PR #63by yahya010RECORD

PR #69by TevBenji

PR #89by vmfunc

PR #108by kellyvv

PR #116by abhishekgahlot2

PR #120by andrewgcodes

PR #123by saikrishnarallabandi

PR #128by rsavitt

PR #131by Billy1900

PR #137by abhishekgahlot2

PR #139by ksang123

PR #150by yahya010

PR #170by baudrillardsgh0st

PR #185by dttdrv

PR #190by newjordan

PR #192by baudrillardsgh0st

PR #194by baudrillardsgh0st

PR #200by khasinski

PR #225by dibdabo

PR #232by kellyvv

PR #238by kellyvv

PR #273by dentity007

PR #295by gowtham0992

PR #295by gowtham0992

PR #297by davidpuertolas

PR #301by lookin-zz

PR #304by Bortlesboat

PR #304by Bortlesboat

PR #306by xuafeng

PR #306by xuafeng

PR #324by crony-io

PR #324by crony-io

PR #326by crony-io

PR #344by aryanbhosale

PR #348by EthanYangTW

PR #348by EthanYangTW

PR #358by adityagupta26

PR #359by tmustier

PR #360by MultiFe22

PR #360by MultiFe22

PR #372by HyperPotatoNeo

PR #372by HyperPotatoNeo

PR #374by unnirRECORD

PR #383by joelnishanth

PR #385by dentity007

PR #389by trasnake87

PR #401by newjordan

PR #433by Robby955

PR #440by Ashutosh3142857

PR #450by zachgoldfine44

PR #454by nalediym

PR #455by kasimte

PR #531by pragnyanramtha

PR #559by Parswanadh

PR #573by Sarimsaljook

PR #575by k-oconnor

PR #667by suchitj2702

PR #670by abaybektursun

PR #695by 0xNoramiya

PR #696by gravelBridge

PR #709by StolbaJ

PR #709by StolbaJ

PR #710by Dhruba531

PR #754by aryanbhosale

PR #760by erikqu

PR #805by zeytx

PR #816by jimliu741523

PR #842by JUSTSUJAY

PR #892by robbiebusinessacc

PR #915by anthony-maio

PR #918by haikosys

PR #927by Tonyy1977

PR #929by andreanjos

PR #979by 0xadvait

PR #989by alexanderaperry-arch

PR #1032by wfproc

PR #1045by Hilo-Hilo

PR #1057by Programmerryoki

PR #1067by dheeren-tejani

PR #1068by LappyG

PR #1070by manfromnowhere143

PR #1077by malc3om

PR #1087by Dhenenjay

PR #1087by Dhenenjay

PR #1154by LucasErcolano

PR #1202by VirajDeshwal

PR #1227by himanshudongre

PR #1228by meinlebenswerk

PR #1284by tyrel-beede

PR #1290by aryanbhosale

PR #1357by mollahasani

PR #1385by korentomas

PR #1388by CiprianFlorin-Ifrim

PR #1417by BruhTheMomentum

PR #1467by PhamPhuHoa-23

PR #1484by AlirezaAlampour

PR #1486by AlirezaAlampour

PR #1509by Lumi-node

PR #1512by Itssshikhar

PR #1559by adityasasidhar

PR #1582by He-Wenhao

PR #1602by SPThole

PR #1621by mrbese

PR #1640by thestbobo

PR #1811by peytontolbert

PR #1821by anjing00monyet-arch

PR #1866by deborahnelson8788726

PR #1896by G3sparky

PR #1930by CarlosItp

PR #2016by sea-rod

PR #2022by BharathSShankar

PR #2042by FF-GardenFn

PR #2048by kineticforge

PR #2086by deniskurlov

PR #2112by fingoldin

Hyperparameters Across PRs

pr_number	bits	scope
37	6	all
63	6	all 2D block weights
69	6	block weights
89	6	per-row block weights
107	—	post-training quantization-aware training
108	6	all
116	6	MLP and attention weights; fp16 passthrough for tied embedding and small/control tensors
120	6	transformer blocks
123	6	weights
128	6	weights
131	6	transformer block weights
137	6	MLP and attention weights; fp16 passthrough for tied embedding
139	2	all linear layers (attention and MLP); ternary {-1, 0, 1} weights
150	6	all
170	6	all weights
185	8	model weights
190	6	all weight matrices except embeddings
192	6	all
194	6	all weights with fp16 tied embeddings
200	6	all
225	6	large matrices / model weights
232	6	all
238	—	all
273	6	all
295	5	MLP
295	6	attention
297	6	MLP and attention weight matrices / full model quantized artifact
301	6	all weights
304	5	MLP layers
304	6	attention layers
306	5	MLP
306	6	attention
324	5	MLPs
324	6	Attention
326	5	MLPs and Attention
344	—	final 15% of training
348	5	MLP
348	6	attention
358	8	all
359	6	all
360	5	MLP
360	6	attention
372	6	attention weights
372	5	MLP weights
374	6	MLP + attention weights
383	6	MLP + attention weights
385	6	all
389	5	final ~5% of training
401	6	MLP + attention weights
433	6	all
440	4	MLP
450	6	all
454	6	all
455	6	MLP and attention weights; int8 for embeddings
531	6	weights during backward pass when LR < 15% peak
559	1	MLP
573	6	late QAT when LR scale < 0.15
575	8	embeddings
667	6	all bank parameters
670	—	all
695	6	MLP and attention weights
696	6	all weights
709	5	MLP
709	6	attention and bigram-proj
710	6	model weights
754	6	all weights
760	2	all weights
805	—	all
816	6	all
842	8	all
892	6	int6
915	—	model
918	—	all
927	6	large weight matrices
929	6	all
979	6	attn/MLP weights
989	6	all
1032	6	all
1045	6	all
1057	6	all
1067	—	block weights
1068	6	all large weight matrices
1070	6	late QAT
1077	6	mixed; MLP int5, attention int6
1087	5	MLP
1087	6	attention
1154	—	structural weights
1202	6	all
1227	5	all
1228	6	all
1284	6	parameter banks
1290	—	final 15% wallclock
1357	6	all
1385	8	all
1388	—	weights and activations
1417	—	all weights
1467	—	all
1484	8	forward pass
1486	8	all weight matrices
1509	4	all
1512	6	all F.linear params
1559	8	selected CastedLinear weights
1582	8	weights
1602	—	all
1621	6	all
1640	6	all linear weights
1811	—	weights
1821	—	embeddings
1866	2	mostly weights
1896	—	weights
1930	8	model weights
2016	6	attention and MLP activations
2022	6	activations
2042	1	HydraMLP gate_up and down weights
2048	1	grouped linear weights
2086	5	quinary weights
2112	—	Q/K/V/MLP weights and K/V activations