PR #1805

open

SP8192 + Compression-Aware QAT on PR #1493, 3-seed val_bpb 1.10314

val_bpb

1.1031

Architecture

Transformer

Optimizer

—

Artifact Size

15,999,417 B

Training Techniques

Quantization

QAT

bits: 6

scope: large 2D linear matrices

Regularization

entropy penalty

parameters: {"target":"soft int6 histogram","lambda":0.001,"beta":10,"warmup":200}

Compression

zstd

level: null

Compression-aware QAT with a differentiable entropy surrogate over soft int6 histograms
Applying the surrogate after warmup to large 2D linear matrices only
Demonstrating stable cross-seed behavior for compression-aware training
Research pivot from 3DCF-style compression ideas to a scoreable CompQAT branch on top of PR #1493