PR #943
closedRecord: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)
by aamodbhattView on GitHub
val_bpb
0.0165
Architecture
Transformer
Optimizer
—
Artifact Size
13,810,840 bytes
Training Techniques
Architecture
Packed causal memory
Added a packed causal n-gram memory path built from train shards and loaded at eval start.
parameters: null
Other
other
Dirichlet-normalized multi-order mixing with count-confidence gating for n-gram evaluation.
parameters: null
Evaluation
full-rescore
parameters: {"all_chunks":true,"two_pass":true}
Novel Contributions
- Packed causal n-gram memory path built from train shards and loaded at eval start
- Dirichlet-normalized multi-order mixing
- Count-confidence gating
- Optional phrase-suffix expert exploration with Dirichlet-only winner config
- Compliance-first submission with score-first ordering preserved