val_bpb
1.1031
Architecture
Transformer
Optimizer
—
Artifact Size
15,999,417 B
Training Techniques
Quantization
QAT
bits: 6
scope: large 2D linear matrices
Regularization
entropy penalty
parameters: {"target":"soft int6 histogram","lambda":0.001,"beta":10,"warmup":200}
Compression
zstd
level: null
Novel Contributions
- Compression-aware QAT with a differentiable entropy surrogate over soft int6 histograms
- Applying the surrogate after warmup to large 2D linear matrices only
- Demonstrating stable cross-seed behavior for compression-aware training
- Research pivot from 3DCF-style compression ideas to a scoreable CompQAT branch on top of PR #1493