PR #674

open

Podracing: 1.0461 BPB (3-seed mean)

val_bpb

1.0461

Architecture

11L/512d U-Net

Optimizer

—

Artifact Size

15.64 MB

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

QAT

bits: 6

scope: all

Architecture

XSA

Uses XSA in the last 4 layers.

parameters: {"last_n":4}

BigramHash

BigramHash vocabulary/feature augmentation.

parameters: {"vocab_size":1536}

RoPE

Uses RoPE dimensions setting.

parameters: {"dims":24}

Other

other

LeakyReLU squared activation with slope 0.5.

parameters: {"activation":"leaky_relu_sq","slope":0.5}

Compression

zstd

level: null

Evaluation

sliding window eval

parameters: {"order":5,"alpha":0.2,"min_count":2,"buckets":4194304,"interpolation":"hashed 5-gram score-first backward-looking mixing"}

Legal score-first hashed 5-gram interpolation during sliding window evaluation
Fixed-weight linear mixing with alpha=0.20 and no target-aware gating
Cache built only from already-scored tokens for strictly backward-looking evaluation
Combination of XSA, BigramHash, GPTQ int6, and late QAT in an 11-layer U-Net