← Back to Quantization

mixed int6/int8

Quantization
Used in
60 PRs
Best BPB
0.4416
Avg BPB
1.1574

Submissions

PR #39by nanlliuRECORD
1.2139
PR #70by jfprincz
1.1659
PR #78by mtybadger
1.1858
PR #92by saikrishnarallabandi
1.1938
PR #99by takhir-iota
1.1605
PR #104by gwelinder
1.3358
PR #120by andrewgcodes
0.9588
PR #131by Billy1900
1.2701
PR #160by ChaseWNorton
1.1623
PR #164by jfprincz
1.1524
PR #176by GLDRoger
1.1732
PR #198by jfprinczRECORD
1.1318
PR #222by ansh-deriv
1.1601
PR #223by 0xjaishy
1.1326
PR #236by saml212
1.1400
PR #254by timowhite88
1.1303
PR #274by haikosys
1.1403
PR #281by charmquark1984
1.1381
PR #287by jfprinczRECORD
1.1271
PR #309by NewyorkDev
1.1914
PR #312by chanwoo-park-official
1.1668
PR #315by jfprincz
1.1248
PR #388by ElliotSlusky
1.1231
PR #453by Divyesh-Thirukonda
1.1248
PR #492by Divyesh-Thirukonda
1.1591
PR #502by aamodbhatt
1.5248
PR #534by rarce
1.1804
PR #570by armmer016
1.3434
PR #598by Christopher-Lee-McClendon
1.1334
PR #803by pentxayc
0.4416
PR #820by mtybadger
1.6252
PR #898by pattern4bots
1.1231
PR #958by shouryamaanjain
1.1382
PR #1042by nothingLiva
1.1217
PR #1046by Jayteare
1.2174
PR #1065by rithunkp
1.1536
PR #1080by ciach
1.1228
PR #1085by adityasasidhar
1.2831
PR #1086by Omrigotlieb
1.1349
PR #1101by amrayach
1.1290
PR #1142by ymrohit
1.1493
PR #1166by Christopher-Lee-McClendon
1.1347
PR #1204by msisovicRECORD
1.1063
PR #1205by SergheiBrinza
1.1431
PR #1389by Rome-1
1.7270
PR #1474by shram86
1.1434
PR #1487by ndokutovich
1.0600
PR #1495by shram86
1.1077
PR #1517by RulinShao
1.0632
PR #1559by adityasasidhar
1.2498
PR #1617by adityasasidhar
1.2192
PR #1635by PapaFranku4647
1.1063
PR #1647by powerpratik
1.0616
PR #1649by joyceyan
1.1271
PR #1697by Buld1n
1.0812
PR #1698by arsenis-cmd
1.0099
PR #1716by himanshudongre
1.0788
PR #1720by kiyoaki
1.0818
PR #1737by sakthivarshans
1.0723
PR #1754by upascal
1.0881

Hyperparameters Across PRs

pr_numberbitsscope
396middle layers 3-6 int6; first/last 3 layers int8
706int6 per-row on MLP and attention projection weights; int8 per-row on embeddings and other tensors
786weights int6, embeddings int8
926weights and embeddings
996.mlp., .attn.c_q., .attn.c_v., .attn.proj. in int6; .attn.c_k. mostly grouped int8; selected late-layer c_k and tok_emb in fp16
1046all block matrices
120transformer blocks and embeddings
1316transformer block weights; embeddings use int8
1606most tensors, with int8 token embedding
1646MLP and attention int6; embeddings and bigram int8; controls fp32
1768all weights by default, with middle blocks 3,4,5,6 forced to int6; embeddings and LM head kept fp16
1986MLP and attention int6; embeddings int8
2226layers 2-8 int6; layers 0/1/9 int8 per-row; embeddings fp16
223MLP+Attn int6, embeddings int8
2366attention + MLP weights; int8 tok_emb
2546MLP+attention; embeddings int8; tied embeddings fp16
2746MLP, attention, tied embeddings
2816MLP+attention; embeddings int8
2876MLP and attention int6, embeddings int8
309boundary layers int8, middle layers int6, tied embeddings fp16, control tensors fp32
3126MLP and attention int6; other large tensors int8
3156MLP and attention int6; embeddings int8
3886MLP and attention weights int6 per-row; embeddings int8 per-row
4536MLP and attention int6; embeddings int8
492
5026attn, mlp
5346layers 1-9 int6, layers 0 and 10 int8
570all
598int6 per-row for attention projections and MLP weights; int8 per-tensor for layer norms, value embeddings, biases, embedding tables
8036model weights
8206model weights
8986embeddings
958all
1042embeddings
10466MLP and attention weights int6; embeddings and Markov table int8; control tensors fp16
10656block weights and embeddings
10806MLP and attention; embeddings on int8 path
10856MLP and attention projections, plus smaller tensors in int8/fp16
1086embeddings, MLP, attention
11016MLP+attn int6; embeddings+other int8
11426auxiliary embeddings + main trunk int8
1166per-row weights
1204model weights
1205weights and embeddings
1389shared layers int8, others int6
14746most tensors with selected sensitive tensors promoted to int8
1487weights and embeddings
14956most tensors with selected sensitive tensors promoted to int8
1517all
1559model weights
16176attention/MLP layers
16356model weights
16476model weights
1649MLP+attention int6, embeddings int8
1697attention and MLP matrices, embeddings
16986matrices and embeddings
17166matrices, embeddings, control tensors, small 2-D matrices
1720int6 for attention and MLP, int8 for embeddings
1737attention/MLP int6, embeddings int8
1754weights and embeddings