PR #2129

open

Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874

val_bpb
1.0587
Architecture
Transformer
Optimizer
Artifact Size
15,990,227 bytes

Training Techniques

Evaluation
sliding window eval
parameters: null
Test-Time Training
full TTT
parameters: {"mode":"phased","single_pass":true}
Other
other
Confidence-adaptive n-gram boost that scales the token boost by the model's predictive confidence on the hinted token: beta_t = TOKEN_BOOST * (1 - q_hint_t)^gamma
parameters: {"gamma_env_var":"ADAPTIVE_BOOST_GAMMA","default_gamma":0}
other
Strict token-only n-gram tilt path with within/word/agree boosts disabled
parameters: {"WITHIN_BOOST":0,"WORD_BOOST":0,"AGREE_ADD_BOOST":0}

Novel Contributions

  • Confidence-Adaptive N-gram Boost
  • Per-position scaling of n-gram tilt by the model's own confidence on the hinted token
  • Two-line adaptive-gamma wiring on top of the PR #2018 strict token-only n-gram tilt stack
  • Reported consistent improvement for gamma in {1, 2} with best result at gamma=1