← Back to Architecture

token-shift mixing

Architecture
Used in
1 PRs
Best BPB
1.2252
Avg BPB
1.2252

Hyperparameters Across PRs

pr_numberparameters
1112{"layers":8}