← Back to Quantization

mixed int6

Quantization
Used in
19 PRs
Best BPB
0.0972
Avg BPB
1.0655

Hyperparameters Across PRs

pr_numberbitsscope
1356MLP and attention weight matrices; FP16 passthrough for tied embeddings and last 2 layers' Key projections
1746large MLP and attention matrices
3396model weights
3986all
5816model weights
6496all
6846model weights
6986all
8116model weights
9226model
9936post-training mixed
10526artifact
14276model weights
14386mlp;attn;embed
14656embeddings
15696default export
16646all
16656MLP, attention, and Mamba projection weights
16966attention/MLP banks