PR #1059

open

Int5 MLP + Int6 Attn + zstd-22, val_bpb 1.1996

by edidishengView on GitHub
val_bpb
1.1996
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Quantization
mixed int5/int6
bits: null
scope: MLP int5, attention int6
Compression
zstd
level: 22
Architecture
MLP3x
Increased MLP expansion ratio from 2x to 3x.
parameters: {"mlp_mult":3}
weight tying
Used tied embeddings.
parameters: null
KV head count
Attention configuration uses 4 KV heads.
parameters: {"kv_heads":4}

Novel Contributions

  • Mixed precision quantization with int5 for MLP blocks and int6 for attention blocks
  • Switched artifact compression from zlib to zstd level 22
  • Increased MLP expansion to 3x and model depth to 11 layers while staying under the size limit
  • Used tied embeddings