← Back to Quantization
mixed int6/int8
QuantizationUsed in
60 PRs
Best BPB
0.4416
Avg BPB
1.1574
Submissions
PR #39by nanlliuRECORD
1.2139PR #70by jfprincz
1.1659PR #78by mtybadger
1.1858PR #92by saikrishnarallabandi
1.1938PR #99by takhir-iota
1.1605PR #104by gwelinder
1.3358PR #120by andrewgcodes
0.9588PR #131by Billy1900
1.2701PR #160by ChaseWNorton
1.1623PR #164by jfprincz
1.1524PR #176by GLDRoger
1.1732PR #198by jfprinczRECORD
1.1318PR #222by ansh-deriv
1.1601PR #223by 0xjaishy
1.1326PR #236by saml212
1.1400PR #254by timowhite88
1.1303PR #274by haikosys
1.1403PR #281by charmquark1984
1.1381PR #287by jfprinczRECORD
1.1271PR #309by NewyorkDev
1.1914PR #312by chanwoo-park-official
1.1668PR #315by jfprincz
1.1248PR #388by ElliotSlusky
1.1231PR #453by Divyesh-Thirukonda
1.1248PR #492by Divyesh-Thirukonda
1.1591PR #502by aamodbhatt
1.5248PR #534by rarce
1.1804PR #570by armmer016
1.3434PR #598by Christopher-Lee-McClendon
1.1334PR #803by pentxayc
0.4416PR #820by mtybadger
1.6252PR #898by pattern4bots
1.1231PR #958by shouryamaanjain
1.1382PR #1042by nothingLiva
1.1217PR #1046by Jayteare
1.2174PR #1065by rithunkp
1.1536PR #1080by ciach
1.1228PR #1085by adityasasidhar
1.2831PR #1086by Omrigotlieb
1.1349PR #1101by amrayach
1.1290PR #1142by ymrohit
1.1493PR #1166by Christopher-Lee-McClendon
1.1347PR #1204by msisovicRECORD
1.1063PR #1205by SergheiBrinza
1.1431PR #1389by Rome-1
1.7270PR #1474by shram86
1.1434PR #1487by ndokutovich
1.0600PR #1495by shram86
1.1077PR #1517by RulinShao
1.0632PR #1559by adityasasidhar
1.2498PR #1617by adityasasidhar
1.2192PR #1635by PapaFranku4647
1.1063PR #1647by powerpratik
1.0616PR #1649by joyceyan
1.1271PR #1697by Buld1n
1.0812PR #1698by arsenis-cmd
1.0099PR #1716by himanshudongre
1.0788PR #1720by kiyoaki
1.0818PR #1737by sakthivarshans
1.0723PR #1754by upascal
1.0881Hyperparameters Across PRs
| pr_number | bits | scope |
|---|---|---|
| 39 | 6 | middle layers 3-6 int6; first/last 3 layers int8 |
| 70 | 6 | int6 per-row on MLP and attention projection weights; int8 per-row on embeddings and other tensors |
| 78 | 6 | weights int6, embeddings int8 |
| 92 | 6 | weights and embeddings |
| 99 | 6 | .mlp., .attn.c_q., .attn.c_v., .attn.proj. in int6; .attn.c_k. mostly grouped int8; selected late-layer c_k and tok_emb in fp16 |
| 104 | 6 | all block matrices |
| 120 | — | transformer blocks and embeddings |
| 131 | 6 | transformer block weights; embeddings use int8 |
| 160 | 6 | most tensors, with int8 token embedding |
| 164 | 6 | MLP and attention int6; embeddings and bigram int8; controls fp32 |
| 176 | 8 | all weights by default, with middle blocks 3,4,5,6 forced to int6; embeddings and LM head kept fp16 |
| 198 | 6 | MLP and attention int6; embeddings int8 |
| 222 | 6 | layers 2-8 int6; layers 0/1/9 int8 per-row; embeddings fp16 |
| 223 | — | MLP+Attn int6, embeddings int8 |
| 236 | 6 | attention + MLP weights; int8 tok_emb |
| 254 | 6 | MLP+attention; embeddings int8; tied embeddings fp16 |
| 274 | 6 | MLP, attention, tied embeddings |
| 281 | 6 | MLP+attention; embeddings int8 |
| 287 | 6 | MLP and attention int6, embeddings int8 |
| 309 | — | boundary layers int8, middle layers int6, tied embeddings fp16, control tensors fp32 |
| 312 | 6 | MLP and attention int6; other large tensors int8 |
| 315 | 6 | MLP and attention int6; embeddings int8 |
| 388 | 6 | MLP and attention weights int6 per-row; embeddings int8 per-row |
| 453 | 6 | MLP and attention int6; embeddings int8 |
| 492 | — | — |
| 502 | 6 | attn, mlp |
| 534 | 6 | layers 1-9 int6, layers 0 and 10 int8 |
| 570 | — | all |
| 598 | — | int6 per-row for attention projections and MLP weights; int8 per-tensor for layer norms, value embeddings, biases, embedding tables |
| 803 | 6 | model weights |
| 820 | 6 | model weights |
| 898 | 6 | embeddings |
| 958 | — | all |
| 1042 | — | embeddings |
| 1046 | 6 | MLP and attention weights int6; embeddings and Markov table int8; control tensors fp16 |
| 1065 | 6 | block weights and embeddings |
| 1080 | 6 | MLP and attention; embeddings on int8 path |
| 1085 | 6 | MLP and attention projections, plus smaller tensors in int8/fp16 |
| 1086 | — | embeddings, MLP, attention |
| 1101 | 6 | MLP+attn int6; embeddings+other int8 |
| 1142 | 6 | auxiliary embeddings + main trunk int8 |
| 1166 | — | per-row weights |
| 1204 | — | model weights |
| 1205 | — | weights and embeddings |
| 1389 | — | shared layers int8, others int6 |
| 1474 | 6 | most tensors with selected sensitive tensors promoted to int8 |
| 1487 | — | weights and embeddings |
| 1495 | 6 | most tensors with selected sensitive tensors promoted to int8 |
| 1517 | — | all |
| 1559 | — | model weights |
| 1617 | 6 | attention/MLP layers |
| 1635 | 6 | model weights |
| 1647 | 6 | model weights |
| 1649 | — | MLP+attention int6, embeddings int8 |
| 1697 | — | attention and MLP matrices, embeddings |
| 1698 | 6 | matrices and embeddings |
| 1716 | 6 | matrices, embeddings, control tensors, small 2-D matrices |
| 1720 | — | int6 for attention and MLP, int8 for embeddings |
| 1737 | — | attention/MLP int6, embeddings int8 |
| 1754 | — | weights and embeddings |