← Back to Architecture

attention scaling

Architecture
Used in
1 PRs
Best BPB
1.5990
Avg BPB
1.5990

Hyperparameters Across PRs

pr_numberparameters
2159{"num_heads":4}