← Back to Architecture

Transformer depth / tied embeddings / KV head count

Architecture
Used in
1 PRs
Best BPB
1.1787
Avg BPB
1.1787

Hyperparameters Across PRs

pr_numberparameters
310{"layers":10,"dimensions":512,"heads":8,"kv_heads":4}