PR #1154

open

Non-record: Polar STE QAT for structural weights

by LucasErcolanoView on GitHub
val_bpb
1.7757
Architecture
Transformer
Optimizer
Artifact Size
14,782,032 bytes

Training Techniques

Quantization
STE QAT
bits: null
scope: structural weights
polar
bits: null
scope: large 2D structural tensors at export
Compression
zlib
level: null
Evaluation
long context eval
parameters: {"context_length":32768}
Sequence Length
sequence_length
train_length: 256
eval_length: 32768
Other
other
Fresh-process artifact-load harness to verify autonomous reconstruction from polar+zlib artifacts
parameters: null
other
Distributed-safe final KV evaluation path to avoid DDP deadlock during final eval
parameters: null
Architecture
RoPE
Fixed RoPE eval cache interaction so inference-mode validation does not leave cached inference tensors behind and break later training steps
parameters: null

Novel Contributions

  • Polar STE QAT for structural weights
  • Polar weight export path for large 2D structural tensors
  • Shared polar row encode/decode helpers for QAT and serialization
  • Roundtrip dequantization in final validation
  • Fresh-process artifact isolation/load harness for polar+zlib reconstruction
  • Distributed-safe final KV evaluation under DDP
  • RoPE cache fix for inference-mode validation