← Back to Architecture

Shared-Specific Attention

Architecture
Used in
1 PRs
Best BPB
1.0981
Avg BPB
1.0981

Hyperparameters Across PRs

pr_numberparameters
1774{"shared_head_dim":16,"specific_dim":48}