← Back to Architecture

sliding window attention

Architecture
Used in
7 PRs
Best BPB
1.0205
Avg BPB
1.1665

Hyperparameters Across PRs

pr_numberparameters
5{"window_size":null}
914{"window_size":512}
1562
1728{"local_window_512_layers":[0,1,2,3,5],"local_window_1024_layers":[6,7,8]}
1863{"windows":[1024,4096]}
1882{"window":1024,"global_layer_stride":2}
2040