PR #944

open

Record: Compliance-First Packed Causal Memory + Dirichlet Mixing — val_bpb 0.01654407 (3-seed mean)

by aamodbhattView on GitHub
val_bpb
0.0165
Architecture
Transformer
Optimizer
Artifact Size
13,810,840 bytes

Training Techniques

Architecture
BigramHash
Packed causal n-gram memory path built from training shards and loaded at eval start; multi-order hashed n-gram tables used for causal scoring.
parameters: null
Other
other
Dirichlet-normalized multi-order mixing over n-gram orders with count-confidence gating.
parameters: null
other
Optional packed phrase-suffix expert blended after the n-gram posterior with confidence throttling.
parameters: null

Novel Contributions

  • Packed causal n-gram memory path built from training shards and loaded at eval start
  • Dirichlet-normalized multi-order mixing with count-confidence gating
  • Optional packed phrase-suffix expert with confidence throttling
  • Compliance-first score-first causal evaluation stack