← Back to Architecture

Sparse Attention Gate

Architecture
Used in
5 PRs
Best BPB
1.0586
Avg BPB
1.0627

Hyperparameters Across PRs

pr_numberparameters
1855{"gate_window":12,"scale":0.5}
1953
2088
2123
2162{"gate_window":12}