← Back to Architecture

Transformer depth/width

Architecture
Used in
1 PRs
Best BPB
1.6660
Avg BPB
1.6660

Hyperparameters Across PRs

pr_numberparameters
240{"layers":7,"model_dim":512}