← Back to Architecture

Transformer depth

Architecture
Used in
7 PRs
Best BPB
1.1550
Avg BPB
1.2327

Hyperparameters Across PRs

pr_numberparameters
60{"layers":10}
63{"layers":10}
166{"layers":10}
242{"layers":10}
793{"layers":10}
805{"layers":11}
830{"layers":11}