val_bpb
1.1129
Architecture
Transformer
Optimizer
—
Artifact Size
15,826,148 bytes
Training Techniques
Architecture
XSA
Uses the public AR self-generated GPTQ + XSA-all stack as the base model variant.
parameters: null
Quantization
GPTQ
bits: null
scope: MLP weights
Other
other
Rotation-aware quantization using compact right-rotations before GPTQ and inverse rotation after dequantization.
parameters: {"mode":"hadamard","targets":["mlp_up","mlp_down"],"block_sizes":[128,256,512]}
other
Blockwise Walsh-Hadamard right-rotations applied during export/evaluation.
parameters: {"targets":["mlp_up","mlp_down"]}
Evaluation
sliding window eval
parameters: {"stride":64}
LR Schedule
warmdown
parameters: {"warmdown_iters":4000}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Novel Contributions
- Non-record submission using the same leaderboard metric and artifact accounting
- Rotation-aware GPTQ export with Hadamard right-rotations
- Per-layer block-size search over 128, 256, and 512 for selected MLP matrices
- Improved export-path val_bpb on the same checkpoint while staying under the 16MB cap