PR #1733

open

Non-record: Ternary MLP Quantization — Void Fraction (val_bpb 1.3262, 10.9MB)

by G3sparkyView on GitHub
val_bpb
1.3262
Architecture
Transformer
Optimizer
Artifact Size
10.9 MB

Training Techniques

Quantization
GPTQ
bits: 1
scope: MLP
int6
bits: 6
scope: attention
int8
bits: 8
scope: embeddings
Architecture
depth recurrence
Uses depth recurrence in the base architecture.
parameters: null
parallel residuals
Uses parallel residual connections.
parameters: null
Test-Time Training
score-first TTT
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: null

Novel Contributions

  • Ternary {-1, 0, +1} GPTQ quantization for MLP layers
  • Void fraction thesis: roughly 30% of weights converge to near-zero
  • Mixed-precision scheme with ternary MLP, int6 attention, and int8 embeddings
  • Post-hoc ternary quantization as a proof-of-concept for the competition wish list
  • Hessian-aware GPTQ adapted to ternary rounding
  • Compact 10.9 MB artifact under the 16 MB cap