← Back to Architecture
MoE
ArchitectureUsed in
3 PRs
Best BPB
1.1180
Avg BPB
1.2510
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 480 | {"experts":2,"expert_multiplier":1.5} |
| 981 | {"moe_layers":0,"total_layers":2} |
| 1451 | — |