← Back to Architecture

MoE

Architecture
Used in
3 PRs
Best BPB
1.1180
Avg BPB
1.2510

Hyperparameters Across PRs

pr_numberparameters
480{"experts":2,"expert_multiplier":1.5}
981{"moe_layers":0,"total_layers":2}
1451