← Back to Quantization

mixed int5 (MLP) / int6 (attention) + GPTQ-lite per-row clip search + 3% magnitude pruning + FP16 passthrough for embeddings + zstd-22 compression

Quantization
Used in
1 PRs
Best BPB
1.1354
Avg BPB
1.1354

Hyperparameters Across PRs

pr_numberbitsscope
562MLP, attention, embeddings