← Back to Architecture

factorized late layers

Architecture
Used in
1 PRs
Best BPB
1.2050
Avg BPB
1.2050

Hyperparameters Across PRs

pr_numberparameters
1752