← Back to Architecture

depth/narrow transformer

Architecture
Used in
1 PRs
Best BPB
1.3509
Avg BPB
1.3509

Hyperparameters Across PRs

pr_numberparameters
71{"layers":12,"model_dim":416}