← Back to Architecture

bidirectional attention

Architecture
Used in
2 PRs
Best BPB
1.3485
Avg BPB
1.3542

Hyperparameters Across PRs

pr_numberparameters
1053
1403{"is_causal":false}