← Back to Architecture
attention modification
ArchitectureUsed in
8 PRs
Best BPB
1.0756
Avg BPB
1.2203
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1212 | {"window_size":512,"layers":[2,4,6,8,10]} |
| 1239 | {"curvature_range":[0.1,2]} |
| 1530 | — |
| 1648 | {"layers":11} |
| 1937 | {"pattern":"40,80,full"} |
| 2004 | {"core_attention_block":3,"mlp_only_blocks":[4,5,6,7]} |
| 2073 | — |
| 2073 | — |