PR #959

open

Record: Nacrith Log-Bias + Full-Rescore N-gram — val_bpb 0.00000035 (3-seed mean)

by himanalotView on GitHub
val_bpb
0.0000
Architecture
Transformer
Optimizer
SGD
Artifact Size
13.44 MB

Training Techniques

Evaluation
full-rescore two-pass N-gram
parameters: {"NGRAM_FULL_RESCORE":1,"NGRAM_TWO_PASS_ENABLED":1,"NGRAM_EVAL_MAX_ORDER":13,"NGRAM_EVAL_CHUNK_TOKENS":262144,"NGRAM_EVAL_BUCKETS":4194304,"NGRAM_EVAL_ALPHA_MAX":0.85,"NGRAM_SELF_EXCLUDE":0,"NGRAM_COUNT_CONF_GAIN":0}
Other
other
Nacrith-style adaptive log-space bias updated online after each scored token
parameters: {"ONLINE_CAL":3,"BIAS_LR":0.05,"BIAS_DECAY":1}
other
Log-odds mixing / PAQ-style logit-space interpolation
parameters: {"BLEND_MODE":"logodds"}
Optimizer
SGD
weight_decay: null
momentum: null
other_params: {"learning_rate":0.05}
Regularization
weight decay
parameters: {"BIAS_DECAY":1}

Novel Contributions

  • Nacrith-style adaptive log-space bias for evaluation-time correction
  • Log-odds mixing instead of linear probability mixing
  • Full-rescore two-pass N-gram evaluation using a complete cache
  • Near-zero BPB achieved via online per-token bias adaptation