← Back to Architecture
Residual Input Mixing
ArchitectureUsed in
1 PRs
Best BPB
1.1169
Avg BPB
1.1169
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 615 | {"layers":11,"dimension":512,"MHA":"8/8","MLP":"3.5x (1792)","BigramHash":8192,"XSA":"all layers","mixed residuals":"each layer from 2 previous layers"} |