PR #881

open

Record: WaterLOO — Full-Rescore N-gram Cache with Self-Exclusion (val_bpb 0.0990)

by simon-marcusView on GitHub
val_bpb
0.0990
Architecture
Transformer
Optimizer
Artifact Size
~15.87 MB

Training Techniques

Evaluation
sliding window eval
parameters: null
Other
other
Full-rescore two-pass n-gram cache evaluation over the entire validation stream using a prebuilt global cache
parameters: {"ngram_orders":"2-12","full_stream_rescore":true}
other
Leave-one-out self-exclusion during pass 2 by subtracting each token's own context and context-target counts before scoring
parameters: null
other
Vectorized cache construction using np.bincount
parameters: null
other
Complementary training enabled
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Full-rescore n-gram cache evaluated over the entire validation stream
  • Leave-one-out self-exclusion that removes each token's own cache contribution during rescoring
  • Fast vectorized cache construction with np.bincount
  • Demonstration that the full-rescore architecture remains strong even without self-inclusion