PR #1649
openNon-record: Custom serialization replacing torch.save + zstd-22
by joyceyanView on GitHub
val_bpb
1.1271
Architecture
Transformer
Optimizer
—
Artifact Size
15,150,085 bytes
Training Techniques
Architecture
XSA
Uses XSA on the last 4 layers.
parameters: {"layers":4}
MLP3x
Uses a 3x MLP expansion.
parameters: null
BigramHash
Uses bigram hash embeddings.
parameters: null
Weight Averaging
EMA
parameters: {"decay":0.997}
Quantization
mixed int6/int8
bits: null
scope: MLP+attention int6, embeddings int8
Compression
custom
level: null
Novel Contributions
- Replaces torch.save + zstd-22 with a custom binary serialization format
- Uses ANS entropy coding for lossless compression
- Clusters row-level histograms with K-means to share probability models
- Separates dtype streams and applies dtype-specific transforms such as zigzag encoding and byte shuffling
- Removes pickle/container overhead by storing a compact compressed JSON header and length-prefixed streams
- Achieves a 362,946-byte reduction versus the baseline artifact size without changing model weights or BPB