PR #1927

open

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression

by squ11z1View on GitHub
val_bpb
0.8335
Architecture
Transformer
Optimizer
AdaMuon
Artifact Size
18,204 bytes

Training Techniques

Quantization
int6 SDClip
bits: 6
scope: all
INT2/INT4 LQER
bits: null
scope: top-K residual factors
Architecture
SharedMoE
Mixture-of-experts component inherited from PR #1901 base stack.
parameters: null
DualTokenHashSkip
Token hashing / skip-style architectural component inherited from PR #1901 base stack.
parameters: null
Test-Time Training
score-first TTT
parameters: null
Optimizer
AdaMuon
weight_decay: null
momentum: null
other_params: null
Compression
lzma
level: 9
brotli
level: 11
Other
other
Stride-2 byte-shuffle applied before Brotli compression to improve artifact packing efficiency.
parameters: {"stride":2}

Novel Contributions

  • LQER asymmetric rank-4 post-quantization correction applied to a Sigma-Delta-quantized stack
  • Brotli-11 plus stride-2 byte-shuffle replacing LZMA for artifact compression
  • Theoretical delta-BPB estimate for combining LQER and improved compression
  • Patched submission provided with LZMA-base85-wrapped train_gpt.py