PR #1224

open

Add non-record AR GPTQ XSA ROTQ Hadamard submission

by vermissa0ssView on GitHub
val_bpb
1.1129
Architecture
Transformer
Optimizer
Artifact Size
15,826,148 bytes

Training Techniques

Architecture
XSA
Uses the public AR self-generated GPTQ + XSA-all stack as the base model variant.
parameters: null
Quantization
GPTQ
bits: null
scope: MLP weights
Other
other
Rotation-aware quantization using compact right-rotations before GPTQ and inverse rotation after dequantization.
parameters: {"mode":"hadamard","targets":["mlp_up","mlp_down"],"block_sizes":[128,256,512]}
other
Blockwise Walsh-Hadamard right-rotations applied during export/evaluation.
parameters: {"targets":["mlp_up","mlp_down"]}
Evaluation
sliding window eval
parameters: {"stride":64}
LR Schedule
warmdown
parameters: {"warmdown_iters":4000}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024

Novel Contributions

  • Non-record submission using the same leaderboard metric and artifact accounting
  • Rotation-aware GPTQ export with Hadamard right-rotations
  • Per-layer block-size search over 128, 256, and 512 for selected MLP matrices
  • Improved export-path val_bpb on the same checkpoint while staying under the 16MB cap