PR #753
openPodracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)
by newjordanView on GitHub
val_bpb
0.9625
Architecture
Transformer
Optimizer
—
Artifact Size
15.71 MB
Training Techniques
Quantization
GPTQ
bits: null
scope: all
Architecture
XSA
Uses XSA as part of the base architecture / model configuration.
parameters: {"last_n":4}
Evaluation
multi-order backoff n-gram eval
parameters: {"orders":"2-7","cascade_on_miss":true}
adaptive alpha evaluation
parameters: {"alpha_formula":"0.05 + 0.55 * sigmoid(2 * (H - 4.0))","entropy_based":true}
Test-Time Training
score-first TTT
parameters: {"enabled":false}
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Multi-order backoff n-gram evaluation over orders 2-7 with longest-context-first cascading on miss
- Entropy-adaptive alpha that increases trust in the n-gram model when the base model is more uncertain
- Evaluation-time improvements only, with no training changes