← Back to Architecture

Differential Attention V2

Architecture
Used in
1 PRs
Best BPB
1.8522
Avg BPB
1.8522

Hyperparameters Across PRs

pr_numberparameters
345