PR #1821
openHierarchical Quantized Embedding (HQE): Zipf-law mixed precision
by anjing00monyet-archView on GitHub
val_bpb
1.3825
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Quantization
mixed int4/int6/int8
bits: null
scope: embeddings
STE QAT
bits: null
scope: embeddings
Architecture
other
Hierarchical locality-based mixed-precision embedding tiers assigned by token frequency (Zipf-law): fp16 for top 256 tokens, int8 for tokens 256-2047, int6 for tokens 2048-16383, and int4 for the long tail.
parameters: {"tiers":4,"top_fp16":256,"int8_range":[256,2047],"int6_range":[2048,16383],"int4_range":[16384,57600]}
Novel Contributions
- Hierarchical Quantized Embedding (HQE) for embeddings
- Zipf-law-based token frequency tiering
- 4-tier mixed precision embedding scheme
- Straight-Through Estimator for gradient flow through quantization
- Reported ~70% embedding memory reduction versus fp16 baseline