mixed int5 (MLP) / int6 (attention) + GPTQ-lite per-row clip search + 3% magnitude pruning + FP16 passthrough for embeddings + zstd-22 compression

Quantization

Used in

1 PRs

Best BPB

1.1354

Avg BPB

1.1354

Submissions

pr_number	bits	scope
562	—	MLP, attention, embeddings