← Back to Architecture

attention modification

Architecture
Used in
8 PRs
Best BPB
1.0756
Avg BPB
1.2203

Hyperparameters Across PRs

pr_numberparameters
1212{"window_size":512,"layers":[2,4,6,8,10]}
1239{"curvature_range":[0.1,2]}
1530
1648{"layers":11}
1937{"pattern":"40,80,full"}
2004{"core_attention_block":3,"mlp_only_blocks":[4,5,6,7]}
2073
2073