PR #943

closed

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)

by aamodbhattView on GitHub
val_bpb
0.0165
Architecture
Transformer
Optimizer
Artifact Size
13,810,840 bytes

Training Techniques

Architecture
Packed causal memory
Added a packed causal n-gram memory path built from train shards and loaded at eval start.
parameters: null
Other
other
Dirichlet-normalized multi-order mixing with count-confidence gating for n-gram evaluation.
parameters: null
Evaluation
full-rescore
parameters: {"all_chunks":true,"two_pass":true}

Novel Contributions

  • Packed causal n-gram memory path built from train shards and loaded at eval start
  • Dirichlet-normalized multi-order mixing
  • Count-confidence gating
  • Optional phrase-suffix expert exploration with Dirichlet-only winner config
  • Compliance-first submission with score-first ordering preserved