← Back to Architecture

MoE

Architecture
Used in
4 PRs
Best BPB
0.8335
Avg BPB
1.1466

Hyperparameters Across PRs

pr_numberparameters
480{"experts":2,"expert_multiplier":1.5}
981{"moe_layers":0,"total_layers":2}
1451
1901{"shared_experts":1,"specialized_experts":3,"top_k":1}