PR #1649

open

Non-record: Custom serialization replacing torch.save + zstd-22

by joyceyanView on GitHub
val_bpb
1.1271
Architecture
Transformer
Optimizer
Artifact Size
15,150,085 bytes

Training Techniques

Architecture
XSA
Uses XSA on the last 4 layers.
parameters: {"layers":4}
MLP3x
Uses a 3x MLP expansion.
parameters: null
BigramHash
Uses bigram hash embeddings.
parameters: null
Weight Averaging
EMA
parameters: {"decay":0.997}
Quantization
mixed int6/int8
bits: null
scope: MLP+attention int6, embeddings int8
Compression
custom
level: null

Novel Contributions

  • Replaces torch.save + zstd-22 with a custom binary serialization format
  • Uses ANS entropy coding for lossless compression
  • Clusters row-level histograms with K-means to share probability models
  • Separates dtype streams and applies dtype-specific transforms such as zigzag encoding and byte shuffling
  • Removes pickle/container overhead by storing a compact compressed JSON header and length-prefixed streams
  • Achieves a 362,946-byte reduction versus the baseline artifact size without changing model weights or BPB