PR #674

open

Podracing: 1.0461 BPB (3-seed mean)

by newjordanView on GitHub
val_bpb
1.0461
Architecture
11L/512d U-Net
Optimizer
Artifact Size
15.64 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
QAT
bits: 6
scope: all
Architecture
XSA
Uses XSA in the last 4 layers.
parameters: {"last_n":4}
BigramHash
BigramHash vocabulary/feature augmentation.
parameters: {"vocab_size":1536}
RoPE
Uses RoPE dimensions setting.
parameters: {"dims":24}
Other
other
LeakyReLU squared activation with slope 0.5.
parameters: {"activation":"leaky_relu_sq","slope":0.5}
Compression
zstd
level: null
Evaluation
sliding window eval
parameters: {"order":5,"alpha":0.2,"min_count":2,"buckets":4194304,"interpolation":"hashed 5-gram score-first backward-looking mixing"}

Novel Contributions

  • Legal score-first hashed 5-gram interpolation during sliding window evaluation
  • Fixed-weight linear mixing with alpha=0.20 and no target-aware gating
  • Cache built only from already-scored tokens for strictly backward-looking evaluation
  • Combination of XSA, BigramHash, GPTQ int6, and late QAT in an 11-layer U-Net