← Back to Quantization

mixed int5/int6/int8

Quantization
Used in
6 PRs
Best BPB
1.1172
Avg BPB
1.1934

Hyperparameters Across PRs

pr_numberbitsscope
272MLP matrices int5, attention matrices int6, elsewhere int8
349MLP weights int5, attention weights int6, embeddings int8/FP16 for small tensors
623MLP weights (int5), Attention weights (int6), Bigram embeddings (int6), Token embeddings (int8)
678MLP int5, attention int6, bigram embeddings int6, token embeddings int8
1090MLP, attention, embeddings
1422MLP, attention, embeddings