PR #706

open

Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²

by newjordanView on GitHub
val_bpb
1.0461
Architecture
11L/512d U-Net
Optimizer
Artifact Size
15.64 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
XSA
Uses XSA attention with the last 4 layers modified.
parameters: {"last_n":4}
BigramHash
Adds a BigramHash component for hashed n-gram features.
parameters: {"vocab_size":1536}
RoPE
Uses partial rotary positional embeddings.
parameters: {"dimensions":24}
tied embeddings
Input and output embeddings are tied.
parameters: null
Other
other
LeakyReLU squared activation with slope 0.5.
parameters: {"slope":0.5}
Evaluation
5-gram eval interpolation
parameters: {"alpha":0.2,"order":5,"min_count":2,"buckets":4194304,"score_first":true,"legal":true}
Test-Time Training
score-first TTT
parameters: {"disabled":true}
Compression
zstd
level: null

Novel Contributions

  • 5-gram eval interpolation using a fixed-weight hashed n-gram cache built from already-scored tokens only
  • Score-first legal evaluation with no safety gate or target-aware selection
  • LeakyReLU squared activation
  • XSA last-4 configuration with BigramHash and partial RoPE
  • GPTQ int6 quantization with late QAT