PR #1821

open

Hierarchical Quantized Embedding (HQE): Zipf-law mixed precision

by anjing00monyet-archView on GitHub
val_bpb
1.3825
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Quantization
mixed int4/int6/int8
bits: null
scope: embeddings
STE QAT
bits: null
scope: embeddings
Architecture
other
Hierarchical locality-based mixed-precision embedding tiers assigned by token frequency (Zipf-law): fp16 for top 256 tokens, int8 for tokens 256-2047, int6 for tokens 2048-16383, and int4 for the long tail.
parameters: {"tiers":4,"top_fp16":256,"int8_range":[256,2047],"int6_range":[2048,16383],"int4_range":[16384,57600]}

Novel Contributions

  • Hierarchical Quantized Embedding (HQE) for embeddings
  • Zipf-law-based token frequency tiering
  • 4-tier mixed precision embedding scheme
  • Straight-Through Estimator for gradient flow through quantization
  • Reported ~70% embedding memory reduction versus fp16 baseline