val_bpb
1.1231
Architecture
Transformer
Optimizer
—
Artifact Size
15.8 MB
Training Techniques
Quantization
mixed int6/int8
bits: 6
scope: embeddings
Evaluation
sliding window eval
parameters: null
Novel Contributions
- Frequency-weighted embedding quantization based on token frequency
- Assigning int8 precision to the top 100 most frequent tokens
- Assigning int6 precision to the remaining embedding rows
- Separate dequantization path for mixed-precision embeddings
- Using token frequency analysis to prioritize precision where it affects most text