val_bpb
1.7757
Architecture
Transformer
Optimizer
—
Artifact Size
14,782,032 bytes
Training Techniques
Quantization
STE QAT
bits: null
scope: structural weights
polar
bits: null
scope: large 2D structural tensors at export
Compression
zlib
level: null
Evaluation
long context eval
parameters: {"context_length":32768}
Sequence Length
sequence_length
train_length: 256
eval_length: 32768
Other
other
Fresh-process artifact-load harness to verify autonomous reconstruction from polar+zlib artifacts
parameters: null
other
Distributed-safe final KV evaluation path to avoid DDP deadlock during final eval
parameters: null
Architecture
RoPE
Fixed RoPE eval cache interaction so inference-mode validation does not leave cached inference tensors behind and break later training steps
parameters: null
Novel Contributions
- Polar STE QAT for structural weights
- Polar weight export path for large 2D structural tensors
- Shared polar row encode/decode helpers for QAT and serialization
- Roundtrip dequantization in final validation
- Fresh-process artifact isolation/load harness for polar+zlib reconstruction
- Distributed-safe final KV evaluation under DDP
- RoPE cache fix for inference-mode validation