← Back to Architecture

shared attention

Architecture
Used in
2 PRs
Best BPB
1.3527
Avg BPB
1.3553

Hyperparameters Across PRs

pr_numberparameters
1025{"layers":11,"bases":10,"rank":128}
1235{"shared_pairs":4}