PR #853

open

Record: Two-Pass Order-12 N-gram Backoff + 256K Chunks — 0.1315 BPB

by quietsmileView on GitHub
val_bpb
0.1315
Architecture
Optimizer
Artifact Size
~13.4 MB

Training Techniques

Evaluation
two-pass n-gram rescoring
parameters: {"rescore_chunks":50,"pass_1_builds_complete_cache":true,"pass_2_uses_full_cache":true}
n-gram backoff with extended order
parameters: {"max_order":12,"extended_hash_primes":true}
larger chunked cache refresh
parameters: {"chunk_tokens":262144,"alpha_max":0.7}
Sequence Length
sequence_length
train_length: null
eval_length: 262144
Test-Time Training
none
parameters: {"enabled":0}

Novel Contributions

  • Combines two-pass n-gram rescoring with order-12 n-gram backoff and 256K token chunks
  • Rescores the first 50 cold-cache chunks using the complete cache from pass 1
  • Extends n-gram hash primes to support orders 10-12
  • Uses 256K chunks and alpha_max=0.70 for faster cache refresh
  • Maintains score-first compliance with no test-time training