← Back to Architecture

Residual Input Mixing

Architecture
Used in
1 PRs
Best BPB
1.1169
Avg BPB
1.1169

Hyperparameters Across PRs

pr_numberparameters
615{"layers":11,"dimension":512,"MHA":"8/8","MLP":"3.5x (1792)","BigramHash":8192,"XSA":"all layers","mixed residuals":"each layer from 2 previous layers"}