← Back to Quantization
STE QAT
QuantizationUsed in
106 PRs
Best BPB
0.1653
Avg BPB
1.1984
Submissions
PR #37by khasinski
1.2012PR #63by yahya010RECORD
1.1598PR #69by TevBenji
1.1708PR #89by vmfunc
1.1622PR #107by m0at
1.1648PR #108by kellyvv
1.4370PR #116by abhishekgahlot2
1.1666PR #120by andrewgcodes
0.9588PR #123by saikrishnarallabandi
1.1642PR #128by rsavitt
1.1594PR #131by Billy1900
1.2701PR #137by abhishekgahlot2
1.1666PR #139by ksang123
1.2029PR #150by yahya010
1.1478PR #170by baudrillardsgh0st
1.1669PR #185by dttdrv
1.3043PR #190by newjordan
1.1725PR #192by baudrillardsgh0st
1.1502PR #194by baudrillardsgh0st
1.1480PR #200by khasinski
1.2012PR #225by dibdabo
1.2089PR #232by kellyvv
1.4370PR #238by kellyvv
1.5164PR #273by dentity007
1.1575PR #295by gowtham0992
1.1477PR #295by gowtham0992
1.1477PR #297by davidpuertolas
1.1629PR #301by lookin-zz
1.1807PR #304by Bortlesboat
1.4245PR #304by Bortlesboat
1.4245PR #306by xuafeng
1.1448PR #306by xuafeng
1.1448PR #324by crony-io
1.1702PR #324by crony-io
1.1702PR #326by crony-io
1.2890PR #344by aryanbhosale
1.1330PR #348by EthanYangTW
1.1444PR #348by EthanYangTW
1.1444PR #358by adityagupta26
1.1400PR #359by tmustier
1.1345PR #360by MultiFe22
1.1426PR #360by MultiFe22
1.1426PR #372by HyperPotatoNeo
1.1361PR #372by HyperPotatoNeo
1.1361PR #374by unnirRECORD
1.1246PR #383by joelnishanth
1.1320PR #385by dentity007
1.1488PR #389by trasnake87
1.1466PR #401by newjordan
1.1243PR #433by Robby955
1.3441PR #440by Ashutosh3142857
1.2219PR #450by zachgoldfine44
1.1466PR #454by nalediym
1.2055PR #455by kasimte
1.1299PR #531by pragnyanramtha
1.1324PR #559by Parswanadh
1.5348PR #573by Sarimsaljook
1.0523PR #575by k-oconnor
1.1750PR #667by suchitj2702
1.1352PR #670by abaybektursun
1.1171PR #695by 0xNoramiya
1.1360PR #696by gravelBridge
1.2622PR #709by StolbaJ
1.1478PR #709by StolbaJ
1.1478PR #710by Dhruba531
1.1240PR #754by aryanbhosale
1.1253PR #760by erikqu
1.2185PR #805by zeytx
1.1807PR #816by jimliu741523
1.1194PR #842by JUSTSUJAY
1.3380PR #892by robbiebusinessacc
1.1428PR #915by anthony-maio
0.9642PR #918by haikosys
0.1653PR #927by Tonyy1977
1.1696PR #929by andreanjos
1.1653PR #979by 0xadvait
1.1387PR #989by alexanderaperry-arch
1.1402PR #1032by wfproc
1.3631PR #1045by Hilo-Hilo
1.1509PR #1057by Programmerryoki
1.2201PR #1067by dheeren-tejani
1.4242PR #1068by LappyG
1.1510PR #1070by manfromnowhere143
1.1190PR #1077by malc3om
1.1130PR #1087by Dhenenjay
1.1407PR #1087by Dhenenjay
1.1407PR #1154by LucasErcolano
1.7757PR #1202by VirajDeshwal
1.1412PR #1227by himanshudongre
1.4841PR #1228by meinlebenswerk
1.1527PR #1284by tyrel-beede
1.1207PR #1290by aryanbhosale
1.1104PR #1357by mollahasani
1.2200PR #1385by korentomas
1.4465PR #1388by CiprianFlorin-Ifrim
1.5390PR #1417by BruhTheMomentum
1.3039PR #1467by PhamPhuHoa-23
1.1056PR #1484by AlirezaAlampour
1.6656PR #1486by AlirezaAlampour
1.6656PR #1509by Lumi-node
1.1962PR #1512by Itssshikhar
1.1117PR #1559by adityasasidhar
1.2498PR #1582by He-Wenhao
1.3428PR #1602by SPThole
1.0744PR #1621by mrbese
1.1531PR #1640by thestbobo
1.1412Hyperparameters Across PRs
| pr_number | bits | scope |
|---|---|---|
| 37 | 6 | all |
| 63 | 6 | all 2D block weights |
| 69 | 6 | block weights |
| 89 | 6 | per-row block weights |
| 107 | — | post-training quantization-aware training |
| 108 | 6 | all |
| 116 | 6 | MLP and attention weights; fp16 passthrough for tied embedding and small/control tensors |
| 120 | 6 | transformer blocks |
| 123 | 6 | weights |
| 128 | 6 | weights |
| 131 | 6 | transformer block weights |
| 137 | 6 | MLP and attention weights; fp16 passthrough for tied embedding |
| 139 | 2 | all linear layers (attention and MLP); ternary {-1, 0, 1} weights |
| 150 | 6 | all |
| 170 | 6 | all weights |
| 185 | 8 | model weights |
| 190 | 6 | all weight matrices except embeddings |
| 192 | 6 | all |
| 194 | 6 | all weights with fp16 tied embeddings |
| 200 | 6 | all |
| 225 | 6 | large matrices / model weights |
| 232 | 6 | all |
| 238 | — | all |
| 273 | 6 | all |
| 295 | 5 | MLP |
| 295 | 6 | attention |
| 297 | 6 | MLP and attention weight matrices / full model quantized artifact |
| 301 | 6 | all weights |
| 304 | 5 | MLP layers |
| 304 | 6 | attention layers |
| 306 | 5 | MLP |
| 306 | 6 | attention |
| 324 | 5 | MLPs |
| 324 | 6 | Attention |
| 326 | 5 | MLPs and Attention |
| 344 | — | final 15% of training |
| 348 | 5 | MLP |
| 348 | 6 | attention |
| 358 | 8 | all |
| 359 | 6 | all |
| 360 | 5 | MLP |
| 360 | 6 | attention |
| 372 | 6 | attention weights |
| 372 | 5 | MLP weights |
| 374 | 6 | MLP + attention weights |
| 383 | 6 | MLP + attention weights |
| 385 | 6 | all |
| 389 | 5 | final ~5% of training |
| 401 | 6 | MLP + attention weights |
| 433 | 6 | all |
| 440 | 4 | MLP |
| 450 | 6 | all |
| 454 | 6 | all |
| 455 | 6 | MLP and attention weights; int8 for embeddings |
| 531 | 6 | weights during backward pass when LR < 15% peak |
| 559 | 1 | MLP |
| 573 | 6 | late QAT when LR scale < 0.15 |
| 575 | 8 | embeddings |
| 667 | 6 | all bank parameters |
| 670 | — | all |
| 695 | 6 | MLP and attention weights |
| 696 | 6 | all weights |
| 709 | 5 | MLP |
| 709 | 6 | attention and bigram-proj |
| 710 | 6 | model weights |
| 754 | 6 | all weights |
| 760 | 2 | all weights |
| 805 | — | all |
| 816 | 6 | all |
| 842 | 8 | all |
| 892 | 6 | int6 |
| 915 | — | model |
| 918 | — | all |
| 927 | 6 | large weight matrices |
| 929 | 6 | all |
| 979 | 6 | attn/MLP weights |
| 989 | 6 | all |
| 1032 | 6 | all |
| 1045 | 6 | all |
| 1057 | 6 | all |
| 1067 | — | block weights |
| 1068 | 6 | all large weight matrices |
| 1070 | 6 | late QAT |
| 1077 | 6 | mixed; MLP int5, attention int6 |
| 1087 | 5 | MLP |
| 1087 | 6 | attention |
| 1154 | — | structural weights |
| 1202 | 6 | all |
| 1227 | 5 | all |
| 1228 | 6 | all |
| 1284 | 6 | parameter banks |
| 1290 | — | final 15% wallclock |
| 1357 | 6 | all |
| 1385 | 8 | all |
| 1388 | — | weights and activations |
| 1417 | — | all weights |
| 1467 | — | all |
| 1484 | 8 | forward pass |
| 1486 | 8 | all weight matrices |
| 1509 | 4 | all |
| 1512 | 6 | all F.linear params |
| 1559 | 8 | selected CastedLinear weights |
| 1582 | 8 | weights |
| 1602 | — | all |
| 1621 | 6 | all |
| 1640 | 6 | all linear weights |