PR #898

open

Frequency-Weighted Embedding Quantization (1.1231 BPB)

by pattern4botsView on GitHub
val_bpb
1.1231
Architecture
Transformer
Optimizer
Artifact Size
15.8 MB

Training Techniques

Quantization
mixed int6/int8
bits: 6
scope: embeddings
Evaluation
sliding window eval
parameters: null

Novel Contributions

  • Frequency-weighted embedding quantization based on token frequency
  • Assigning int8 precision to the top 100 most frequent tokens
  • Assigning int6 precision to the remaining embedding rows
  • Separate dequantization path for mixed-precision embeddings
  • Using token frequency analysis to prioritize precision where it affects most text