PR #1024

open

Evidence-aware Dirichlet concentration, 35% improvement over fixed c=5.0

by immartianView on GitHub
val_bpb
0.0830
Architecture
Optimizer
Artifact Size

Training Techniques

Other
other
Evidence-aware Dirichlet concentration for hierarchical CTW mixing; adapts smoothing using context count and context specificity (IDF).
parameters: {"base_concentration":5,"formula":"c_eff = c_base / (1 + beta * np.log1p(ctx_count) * specificity_boost)"}

Novel Contributions

  • Replaces fixed Dirichlet concentration c=5.0 with evidence-aware concentration.
  • Adapts smoothing per position using context frequency and context specificity (IDF).
  • Drop-in replacement for hierarchical CTW mixing in PR #986.
  • Claims 35% improvement over fixed concentration on a synthetic benchmark.