PR #1154

open

Non-record: Polar STE QAT for structural weights

by LucasErcolanoView on GitHub

val_bpb

1.7757

Architecture

Transformer

Optimizer

—

Artifact Size

14,782,032 bytes

Training Techniques

Quantization

STE QAT

bits: null

scope: structural weights

polar

bits: null

scope: large 2D structural tensors at export

Compression

zlib

level: null

Evaluation

long context eval

parameters: {"context_length":32768}

Sequence Length

sequence_length

train_length: 256

eval_length: 32768

Other

other

Fresh-process artifact-load harness to verify autonomous reconstruction from polar+zlib artifacts

parameters: null

other

Distributed-safe final KV evaluation path to avoid DDP deadlock during final eval

parameters: null

Architecture

RoPE

Fixed RoPE eval cache interaction so inference-mode validation does not leave cached inference tensors behind and break later training steps

parameters: null