PR #1024
openEvidence-aware Dirichlet concentration, 35% improvement over fixed c=5.0
by immartianView on GitHub
val_bpb
0.0830
Architecture
—
Optimizer
—
Artifact Size
—
Training Techniques
Other
other
Evidence-aware Dirichlet concentration for hierarchical CTW mixing; adapts smoothing using context count and context specificity (IDF).
parameters: {"base_concentration":5,"formula":"c_eff = c_base / (1 + beta * np.log1p(ctx_count) * specificity_boost)"}
Novel Contributions
- Replaces fixed Dirichlet concentration c=5.0 with evidence-aware concentration.
- Adapts smoothing per position using context frequency and context specificity (IDF).
- Drop-in replacement for hierarchical CTW mixing in PR #986.
- Claims 35% improvement over fixed concentration on a synthetic benchmark.