PR #868

open

Record: Budgeted Two-Pass N-gram Backoff — val_bpb 0.11814796 (3-seed mean)

by aamodbhattView on GitHub
val_bpb
0.1181
Architecture
Transformer
Optimizer
Artifact Size
13.44 MB

Training Techniques

Evaluation
stride-based eval
parameters: {"two_pass":true,"rescore_chunks":72,"order":12}
score-first eval
parameters: {"enabled":true}
Other
other
Budgeted two-pass tuner that dynamically caps rescoring chunks based on observed throughput and remaining evaluation budget
parameters: {"target_seconds":580,"safety_seconds":8}

Novel Contributions

  • Budgeted two-pass N-gram backoff evaluation
  • Dynamic rescoring chunk cap based on eval budget
  • Order-12 N-gram backoff interpolation with weighted high-order backoff
  • Score-first evaluation path maintained without tokenizer or dataset changes