← Back to Architecture

Transformer size

Architecture
Used in
1 PRs
Best BPB
1.6231
Avg BPB
1.6231

Hyperparameters Across PRs

pr_numberparameters
248{"layers":8,"model_dim":512}