PR #959

open

Record: Nacrith Log-Bias + Full-Rescore N-gram — val_bpb 0.00000035 (3-seed mean)

by himanalotView on GitHub

val_bpb

0.0000

Architecture

Transformer

Optimizer

SGD

Artifact Size

13.44 MB

Training Techniques

Evaluation

full-rescore two-pass N-gram

parameters: {"NGRAM_FULL_RESCORE":1,"NGRAM_TWO_PASS_ENABLED":1,"NGRAM_EVAL_MAX_ORDER":13,"NGRAM_EVAL_CHUNK_TOKENS":262144,"NGRAM_EVAL_BUCKETS":4194304,"NGRAM_EVAL_ALPHA_MAX":0.85,"NGRAM_SELF_EXCLUDE":0,"NGRAM_COUNT_CONF_GAIN":0}

Other

other

Nacrith-style adaptive log-space bias updated online after each scored token

parameters: {"ONLINE_CAL":3,"BIAS_LR":0.05,"BIAS_DECAY":1}

other

Log-odds mixing / PAQ-style logit-space interpolation

parameters: {"BLEND_MODE":"logodds"}

Optimizer

SGD

weight_decay: null

momentum: null

other_params: {"learning_rate":0.05}

Regularization

weight decay

parameters: {"BIAS_DECAY":1}

Novel Contributions

Nacrith-style adaptive log-space bias for evaluation-time correction
Log-odds mixing instead of linear probability mixing
Full-rescore two-pass N-gram evaluation using a complete cache
Near-zero BPB achieved via online per-token bias adaptation