← Back to Architecture

Sparse Attention Gate

Architecture
Used in
3 PRs
Best BPB
1.0586
Avg BPB
1.0647

Hyperparameters Across PRs

pr_numberparameters
1855{"gate_window":12,"scale":0.5}
1953
2088