PR #1433

open

[Non-record] Codebooks! - val_bpb 1.2067 (3-seed mean)

val_bpb

1.2067

Architecture

Transformer

Optimizer

—

Artifact Size

—

Training Techniques

Quantization

codebook quant

bits: null

scope: MLP/attn weights

int8

bits: 8

scope: outlier tensors

QAT

bits: null

scope: codebook weights

GPTQ

bits: null

scope: codebook assignment/scales

Other

other

Hadamard transform with random sign-flip and rotation to make blocked weights more isotropic before codebook assignment

parameters: null

other

Hessian-aware codebook index and scale assignment using GPTQ machinery and collected Hessians

parameters: null

other

Approximate codebook quantization run every 16 steps during training to provide quantization-aware feedback

parameters: {"interval_steps":16}

Regularization

L2 loss

parameters: {"applied_to":"approximate codebook quantization","schedule":"turned on at end of training"}

Fixed EP8 lattice codebook for weight compression
Hadamard transform preprocessing for blocked weights
Hessian-aware codebook assignment and scale selection
Lightweight quantization-aware proxy training with periodic approximate quantization and auxiliary L2 loss
Outlier-path fallback to int8 for hardest tensors