← Back to Architecture

bidirectional transformer

Architecture
Used in
2 PRs
Best BPB
1.1465
Avg BPB
1.3859

Hyperparameters Across PRs

pr_numberparameters
820{"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4,"mlp_mult":2}
1100{"layers":11,"dimensions":512,"heads":8}