← Back to Quantization
mixed int5 (MLP) / int6 (attention) + GPTQ-lite per-row clip search + 3% magnitude pruning + FP16 passthrough for embeddings + zstd-22 compression
QuantizationUsed in
1 PRs
Best BPB
1.1354
Avg BPB
1.1354
Submissions
Hyperparameters Across PRs
| pr_number | bits | scope |
|---|---|---|
| 562 | — | MLP, attention, embeddings |