← Back to Quantization
QAT
QuantizationUsed in
123 PRs
Best BPB
0.1653
Avg BPB
1.2017
Submissions
PR #30by JackYoung27
1.2663PR #79by Marvbuster
1.8698PR #117by trovatochris
1.1702PR #126by Athenox14
1.7510PR #130by mohosy
1.6372PR #145by mrdavtan
1.2052PR #160by ChaseWNorton
1.1623PR #185by dttdrv
1.3043PR #193by KHUCHAN
1.2917PR #218by bopmite
1.1248PR #237by takoyakisoft
1.8389PR #263by Dannybc123
1.5382PR #286by chris-buckley
1.1628PR #292by xuafeng
1.3274PR #292by xuafeng
1.3274PR #316by SkywardSyntax
1.2035PR #345by anandks2006
1.8522PR #356by sjp611
1.8338PR #375by charmquark1984
1.1257PR #397by translatingthename
1.1364PR #400by chanwoo-park-official
1.1296PR #410by EthanYangTW
1.1216PR #414by signalrush
1.1233PR #415by EthanYangTW
1.1216PR #415by EthanYangTW
1.1216PR #429by AbhisekBasu1
1.1231PR #432by jadechip
1.5295PR #457by carlesonielfa
1.1839PR #469by cmcdnd
1.1418PR #478by gowtham0992
1.1268PR #483by tmustier
1.1346PR #503by EthanYangTW
1.1195PR #508by newjordan
1.1215PR #527by Shuvam-Banerji-Seal
1.4078PR #528by EthanYangTW
1.1195PR #529by EthanYangTW
1.1195PR #534by rarce
1.1804PR #545by EthanYangTW
1.1179PR #589by RoyiRa
1.1178PR #607by Neopolita
1.4750PR #617by ryanadamsai
1.1228PR #674by newjordan
1.0461PR #680by bro4all
1.1483PR #681by Alfaxad
1.4775PR #692by EthanYangTW
1.1186PR #707by Shuvam-Banerji-Seal
1.4078PR #712by Shuvam-Banerji-Seal
1.4078PR #714by Upsalla
1.1187PR #719by Shuvam-Banerji-Seal
1.5252PR #725by Shuvam-Banerji-Seal
1.5252PR #730by janwww
1.1570PR #751by Shuvam-Banerji-Seal
1.5252PR #785by SirSaltySalmon
1.5364PR #790by danialht
1.1172PR #836by autocode-rayes
1.1219PR #852by Prush69
1.1189PR #872by gowtham0992
1.0467PR #882by IshiPareek
1.3762PR #885by lolrazh
0.9958PR #890by sofiabod
0.4405PR #891by robbiebusinessacc
1.1428PR #903by CiprianFlorin-Ifrim
1.2064PR #918by haikosys
0.1653PR #918by haikosys
0.1653PR #918by haikosys
0.1653PR #920by CiprianFlorin-Ifrim
1.1539PR #923by CiprianFlorin-Ifrim
1.1090PR #928by autocode-rayes
1.1211PR #952by FlashyFlash3011
1.1144PR #972by Idan3011
0.3922PR #975by Abhishek8108
1.1216PR #989by alexanderaperry-arch
1.1402PR #996by Idan3011
1.1478PR #1002by SoHarshh
1.1650PR #1007by dillon-blake
1.2252PR #1009by SoHarshh
1.1574PR #1037by TimPietruskyRunPod
1.1179PR #1040by JoeProAI
1.1336PR #1072by vimeto
1.1170PR #1081by michaelwinczuk
1.1220PR #1088by serdardoesml
1.2542PR #1096by vimeto
1.3342PR #1096by vimeto
1.3342PR #1107by mradassaad
1.5633PR #1108by DbBested
1.1502PR #1111by MichaelMcCulloch
0.2532PR #1140by newjordan
1.1874PR #1172by dexhunter
1.1015PR #1179by dexhunter
1.1105PR #1200by Mister2005
1.6768PR #1201by Mister2005
1.6371PR #1227by himanshudongre
1.4841PR #1246by deborahnelson8788726
0.9650PR #1254by Elarwei001
1.1070PR #1256by oidebrett
1.1444PR #1260by dexhunter
1.0929PR #1270by VirajDeshwal
1.1088PR #1273by DushyantChetiwal
1.2196PR #1275by ranausmanai
1.1492PR #1289by MatoTeziTanka
1.0819PR #1302by vlivashkin
1.1078PR #1305by DariusFeher
1.2070PR #1318by renqianluo
1.0095PR #1346by shasank0001
1.2283PR #1347by shasank0001
1.3038PR #1357by mollahasani
1.2200PR #1366by yunoshev
1.1371PR #1417by BruhTheMomentum
1.3039PR #1418by Park-Tae-Hwan
1.4192PR #1431by Idan3011
1.1266PR #1433by mtybadger
1.2067PR #1436by DevWizard-Vandan
1.5546PR #1574by KRGulaj
1.3587PR #1579by Tonyy1977
1.1372PR #1602by SPThole
1.0744PR #1620by shiawyonglim
1.6644PR #1627by mike-ferguson
1.3246PR #1630by KevinChunye
1.1412PR #1683by yunoshev
1.1280PR #1683by yunoshev
1.1280PR #1718by himanshudongre
1.0788PR #1757by aiejvn
1.5194PR #1760by BrandtChristian
1.1863Hyperparameters Across PRs
| pr_number | bits | scope |
|---|---|---|
| 30 | — | post-quantization gap |
| 79 | 6 | all |
| 117 | — | weights |
| 126 | 2 | all weights |
| 130 | 8 | large matrices (>65K params) |
| 145 | 8 | per-row weights |
| 160 | — | submission artifact / timed run support, but not activated before stop |
| 185 | 8 | model weights |
| 193 | 8 | export-matched int8 path |
| 218 | — | all |
| 237 | — | all |
| 263 | 8 | weights during training |
| 286 | — | final phase only |
| 292 | 5 | MLP layers |
| 292 | 6 | attention layers |
| 316 | 7 | all |
| 345 | 8 | all |
| 356 | 6 | all |
| 375 | 4 | full-run |
| 397 | 6 | all |
| 400 | 6 | mlp, attn |
| 410 | 6 | attention; int5 for MLP layers |
| 414 | 6 | model weights |
| 415 | 6 | attention |
| 415 | 5 | MLP |
| 429 | — | all |
| 432 | — | attention |
| 457 | 8 | all |
| 469 | 5 | all |
| 478 | 6 | all |
| 483 | 6 | all |
| 503 | 6 | all |
| 508 | 6 | weights |
| 527 | 6 | — |
| 528 | 6 | all |
| 529 | 6 | all |
| 534 | — | late training |
| 545 | 5 | all weights |
| 589 | 6 | all |
| 607 | 8 | per-row |
| 617 | — | all |
| 674 | 6 | all |
| 680 | 6 | all |
| 681 | 6 | all |
| 692 | 6 | all weights |
| 707 | 6 | all |
| 712 | 6 | all |
| 714 | 6 | all |
| 719 | 6 | all |
| 725 | 6 | all |
| 730 | 8 | fp_params / model artifact |
| 751 | — | all |
| 785 | — | late QAT |
| 790 | — | mixed int6 GPTQ with early QAT |
| 836 | 6 | all |
| 852 | 4 | all |
| 872 | — | all |
| 882 | — | all |
| 885 | — | all |
| 890 | — | all |
| 891 | 6 | int6 |
| 903 | 4 | large weights |
| 918 | 2 | MLP up |
| 918 | 3 | attn/MLP down |
| 918 | 4 | embeddings |
| 920 | 8 | FP8 path / model artifact |
| 923 | — | all |
| 928 | 6 | all |
| 952 | 6 | all |
| 972 | 6 | all |
| 975 | 6 | all |
| 989 | 6 | all |
| 996 | 6 | all |
| 1002 | 4 | MLP and bigram; INT6 attention |
| 1007 | — | all |
| 1009 | 4 | MLP + bigram |
| 1037 | — | all |
| 1040 | 5 | all |
| 1072 | — | all |
| 1081 | — | all |
| 1088 | 8 | model weights |
| 1096 | 6 | shared block weights |
| 1096 | 6 | shared weights |
| 1107 | 6 | Mamba projections and standard CastedLinear layers |
| 1108 | 6 | all |
| 1111 | 6 | all |
| 1140 | 8 | model |
| 1172 | — | late |
| 1179 | — | all |
| 1200 | 6 | Q/K/V/O and MLP up/down bank slices |
| 1201 | 6 | bank slices (Q/K/V/O and MLP up/down) |
| 1227 | 5 | all |
| 1246 | — | all large weight matrices |
| 1254 | 6 | all |
| 1256 | 6 | all |
| 1260 | — | all |
| 1270 | — | model |
| 1273 | 1.58 | all |
| 1275 | 8 | all |
| 1289 | — | model weights |
| 1302 | 6 | all |
| 1305 | — | all |
| 1318 | — | all |
| 1346 | — | MLP |
| 1347 | — | MLP |
| 1357 | 6 | all |
| 1366 | — | MLP int5, attention int6 |
| 1417 | — | all weights |
| 1418 | 4 | model weights |
| 1431 | 5 | MLP layers |
| 1433 | — | codebook weights |
| 1436 | 6 | all |
| 1574 | 6 | weights |
| 1579 | 6 | large weight matrices |
| 1602 | — | all |
| 1620 | 6 | all |
| 1627 | 8 | training export |
| 1630 | — | all |
| 1683 | 4 | MLP |
| 1683 | 5 | attention |
| 1718 | 6 | matrices only |
| 1757 | 6 | all |
| 1760 | 6 | all layers |