PR #2129

open

Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874

by okezueView on GitHub

val_bpb

1.0587

Architecture

Transformer

Optimizer

—

Artifact Size

15,990,227 bytes

Training Techniques

Evaluation

sliding window eval

parameters: null

Test-Time Training

full TTT

parameters: {"mode":"phased","single_pass":true}

Other

other

Confidence-adaptive n-gram boost that scales the token boost by the model's predictive confidence on the hinted token: beta_t = TOKEN_BOOST * (1 - q_hint_t)^gamma

parameters: {"gamma_env_var":"ADAPTIVE_BOOST_GAMMA","default_gamma":0}

other

Strict token-only n-gram tilt path with within/word/agree boosts disabled

parameters: {"WITHIN_BOOST":0,"WORD_BOOST":0,"AGREE_ADD_BOOST":0}

Novel Contributions

Confidence-Adaptive N-gram Boost
Per-position scaling of n-gram tilt by the model's own confidence on the hinted token
Two-line adaptive-gamma wiring on top of the PR #2018 strict token-only n-gram tilt stack
Reported consistent improvement for gamma in {1, 2} with best result at gamma=1