← Back to Architecture

Per-head gated attention

Architecture
Used in
1 PRs
Best BPB
1.4750
Avg BPB
1.4750

Hyperparameters Across PRs

pr_numberparameters
607