← Back to Architecture

Unified Attention

Architecture
Used in
2 PRs
Best BPB
1.1088
Avg BPB
1.1250

Hyperparameters Across PRs

pr_numberparameters
1202{"layers":11,"dimension":528,"heads":4}
1270