PR #846

open

Record: Two-Pass N-gram Rescoring (val_bpb 0.1434)

by himanshudongreView on GitHub
val_bpb
0.1434
Architecture
Transformer
Optimizer
Muon
Artifact Size
13.4 MB

Training Techniques

Quantization
GPTQ
bits: 5
scope: all
Architecture
LeakyReLU(0.9)^2
Uses a LeakyReLU squared activation variant in the transformer.
parameters: {"slope":0.9}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"embeddings_optimizer":"AdamW"}
Weight Averaging
EMA
parameters: {"decays":[0.995,0.996,0.997]}
Evaluation
two-pass n-gram rescoring
parameters: {"rescore_chunks":15,"cold_cache_rescoring":true}
Test-Time Training
score-first TTT
parameters: {"optimizer":"AdamW","temperature":0.98,"chunk_size":2048}
Other
other
Entropy-adaptive order-2-to-9 n-gram backoff with 4M hash buckets.
parameters: {"order_range":"2-9","hash_buckets":4000000}

Novel Contributions

  • Two-pass n-gram evaluation that rescoring early chunks with the complete cache
  • Cold-cache penalty reduction for early validation chunks
  • Backward-looking compliant rescoring of tokens already evaluated in pass 1
  • Combination of score-first TTT, GPTQ-Int5 export, and n-gram rescoring in a single pipeline