← Back to Architecture
bidirectional transformer
ArchitectureUsed in
2 PRs
Best BPB
1.1465
Avg BPB
1.3859
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 820 | {"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":2} |
| 1100 | {"layers":11,"dimensions":512,"heads":8} |