PR #702
openRecord: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish)
by lukacfView on GitHub
val_bpb
1.0244
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15.79 MB
Training Techniques
Architecture
XSA
XSA-all attention variant used in the 11-layer transformer.
parameters: {"layers":11}
SmearGate
SmearGate component included in the base architecture.
parameters: null
BigramHash
BigramHash feature used in the base architecture and referenced in prior baseline.
parameters: {"size":2048}
RoPE
Partial RoPE applied to the model.
parameters: {"dimensions":"16/64"}
Quantization
int6 QAT
bits: 6
scope: all
Compression
zstd
level: 22
Weight Averaging
EMA
parameters: {"decay":0.997}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
full TTT
parameters: {"epochs":100,"learning_rate":0.001}
LR Schedule
cosine decay
parameters: {"scheduler":"CosineAnnealingLR","t_max":100,"eta_min":0.00001}
Regularization
weight decay
parameters: null
Other
other
Entropy-adaptive n-gram mixing during evaluation, increasing reliance on n-gram predictions when model entropy is high.
parameters: {"alpha_formula":"0.05 + 0.35 * sigmoid(2 * (H - 4.0))"}
other
Multi-order n-gram backoff cache used at evaluation time, backing off from 5-gram to 4-gram, 3-gram, and 2-gram contexts.
parameters: {"orders":[2,3,4,5]}
Novel Contributions
- Multi-order n-gram backoff across 2-gram to 5-gram contexts
- Entropy-adaptive mixing weight based on model entropy
- Score-first eval-time n-gram cache updated only after scoring each token
- Proper distribution-preserving mixture of model and n-gram probabilities
- Autonomous research workflow using Goldfish ML and Meerkat
- 3-seed validation of the submission