← Back to Architecture

SwiGLU FFN

Architecture
Used in
2 PRs
Best BPB
1.0672
Avg BPB
1.0926

Hyperparameters Across PRs

pr_numberparameters
462
505{"hidden":1792}