← Back to Architecture

GQA + RoPE

Architecture
Used in
1 PRs
Best BPB
1.4072
Avg BPB
1.4072

Hyperparameters Across PRs

pr_numberparameters
377{"layers":[0,1,2,3,4]}