← Back to Architecture

GELU pre-enrichment

Architecture
Used in
4 PRs
Best BPB
0.3922
Avg BPB
0.9982

Hyperparameters Across PRs

pr_numberparameters
810{"input_dim":512,"hidden_dim":768,"output_dim":512}
972{"dimensions":[512,768,512]}
978{"input_dim":512,"hidden_dim":768}
996{"input_dim":512,"hidden_dim":768,"output_dim":512}