PR #1059

open

Int5 MLP + Int6 Attn + zstd-22, val_bpb 1.1996

val_bpb

1.1996

Architecture

Transformer

Optimizer

—

Artifact Size

—

Training Techniques

Quantization

mixed int5/int6

bits: null

scope: MLP int5, attention int6

Compression

zstd

level: 22

Architecture

MLP3x

Increased MLP expansion ratio from 2x to 3x.

parameters: {"mlp_mult":3}

weight tying

Used tied embeddings.

parameters: null

KV head count

Attention configuration uses 4 KV heads.

parameters: {"kv_heads":4}

Mixed precision quantization with int5 for MLP blocks and int6 for attention blocks
Switched artifact compression from zlib to zstd level 22
Increased MLP expansion to 3x and model depth to 11 layers while staying under the size limit
Used tied embeddings