← Back to Architecture

BackoffNgramMixer

Architecture
Used in
2 PRs
Best BPB
0.0308
Avg BPB
0.3490

Hyperparameters Across PRs

pr_numberparameters
813{"orders":"2-7"}
883{"max_order":13,"experts":13}