PR #753

open

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)

val_bpb

0.9625

Architecture

Transformer

Optimizer

—

Artifact Size

15.71 MB

Training Techniques

Quantization

GPTQ

bits: null

scope: all

Architecture

XSA

Uses XSA as part of the base architecture / model configuration.

parameters: {"last_n":4}

Evaluation

multi-order backoff n-gram eval

parameters: {"orders":"2-7","cascade_on_miss":true}

adaptive alpha evaluation

parameters: {"alpha_formula":"0.05 + 0.55 * sigmoid(2 * (H - 4.0))","entropy_based":true}

Test-Time Training

score-first TTT

parameters: {"enabled":false}

Sequence Length

sequence_length

train_length: null

eval_length: null

Multi-order backoff n-gram evaluation over orders 2-7 with longest-context-first cascading on miss
Entropy-adaptive alpha that increases trust in the n-gram model when the base model is more uncertain
Evaluation-time improvements only, with no training changes