← Back to Architecture

ReLU²

Architecture
Used in
50 PRs
Best BPB
0.0972
Avg BPB
1.1312

Hyperparameters Across PRs

pr_numberparameters
85
587
623
903
920
922{"mlp_multiplier":3}
923
958
975
992
1001
1029
1046
1055
1100
1101
1106
1111
1129
1136
1144
1180
1205{"hidden":1536}
1226
1234
1237
1241
1246
1259
1268
1273{"mlp_multiplier":4}
1293
1337
1403
1418
1447
1448
1449
1474
1488
1495
1496
1512
1555
1582{"hidden_dim":1024}
1672{"block_m":128,"block_n":128,"block_k":64}
1684
1691
1699{"mlp_hidden_dim":1152}
1728