PR #1224

open

Add non-record AR GPTQ XSA ROTQ Hadamard submission

val_bpb

1.1129

Architecture

Transformer

Optimizer

—

Artifact Size

15,826,148 bytes

Training Techniques

Architecture

XSA

Uses the public AR self-generated GPTQ + XSA-all stack as the base model variant.

parameters: null

Quantization

GPTQ

bits: null

scope: MLP weights

Other

other

Rotation-aware quantization using compact right-rotations before GPTQ and inverse rotation after dequantization.

parameters: {"mode":"hadamard","targets":["mlp_up","mlp_down"],"block_sizes":[128,256,512]}

other

Blockwise Walsh-Hadamard right-rotations applied during export/evaluation.

parameters: {"targets":["mlp_up","mlp_down"]}

Evaluation

sliding window eval

parameters: {"stride":64}

LR Schedule

warmdown

parameters: {"warmdown_iters":4000}

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Non-record submission using the same leaderboard metric and artifact accounting
Rotation-aware GPTQ export with Hadamard right-rotations
Per-layer block-size search over 128, 256, and 512 for selected MLP matrices
Improved export-path val_bpb on the same checkpoint while staying under the 16MB cap