PR #888

open

Record: Fast Full-Rescore N-gram — val_bpb 0.09420444 (3-seed mean)

by aamodbhattView on GitHub
val_bpb
0.0942
Architecture
Transformer
Optimizer
Artifact Size
13.44 MB

Training Techniques

Evaluation
full-rescore
parameters: {"two_pass":true,"score_first":true}
Other
other
Score-first N-gram evaluation that stores per-token neural probabilities/entropy in pass 1, builds a full N-gram cache from scored tokens, and rescoring pass 2 runs across all chunks without a second neural forward pass.
parameters: {"pass1_records_token_stats":true,"pass2_no_second_forward_pass":true}
other
Robustness controls for N-gram rescoring using self-exclusion and confidence-gain gating.
parameters: {"NGRAM_SELF_EXCLUDE":0,"NGRAM_COUNT_CONF_GAIN":0}
Sequence Length
sequence_length
train_length: null
eval_length: 262144

Novel Contributions

  • Added a score-first full-rescore path in N-gram evaluation
  • Stored per-token neural probabilities and entropy during the first pass
  • Built a full N-gram cache from scored tokens
  • Rescored all chunks in pass 2 without a second neural forward pass
  • Added robustness knobs for self-exclusion and confidence-gain gating
  • Achieved a 3-seed mean val_bpb of 0.09420444 under the 16MB submission limit