PR #1821

open

Hierarchical Quantized Embedding (HQE): Zipf-law mixed precision

by anjing00monyet-archView on GitHub

val_bpb

1.3825

Architecture

Transformer

Optimizer

—

Artifact Size

—

Training Techniques

Quantization

mixed int4/int6/int8

bits: null

scope: embeddings

STE QAT

bits: null

scope: embeddings

Architecture

other

Hierarchical locality-based mixed-precision embedding tiers assigned by token frequency (Zipf-law): fp16 for top 256 tokens, int8 for tokens 256-2047, int6 for tokens 2048-16383, and int4 for the long tail.

parameters: {"tiers":4,"top_fp16":256,"int8_range":[256,2047],"int6_range":[2048,16383],"int4_range":[16384,57600]}

Novel Contributions

Hierarchical Quantized Embedding (HQE) for embeddings
Zipf-law-based token frequency tiering
4-tier mixed precision embedding scheme
Straight-Through Estimator for gradient flow through quantization
Reported ~70% embedding memory reduction versus fp16 baseline