← Back to Architecture
MoE
ArchitectureUsed in
4 PRs
Best BPB
0.8335
Avg BPB
1.1466
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 480 | {"experts":2,"expert_multiplier":1.5} |
| 981 | {"moe_layers":0,"total_layers":2} |
| 1451 | — |
| 1901 | {"shared_experts":1,"specialized_experts":3,"top_k":1} |