← Back to Architecture

parallel residuals

Architecture
Used in
42 PRs
Best BPB
1.0616
Avg BPB
1.1408

Hyperparameters Across PRs

pr_numberparameters
1204{"start_layer":7}
1274{"start_layer":7}
1326{"start_layer":7}
1333{"start_layer":7}
1334{"start_layer":7}
1338{"start_layer":7}
1339{"start_layer":7}
1381{"start_layer":7}
1396{"start_layer":7}
1412{"start_layer":7}
1420{"start_layer":7,"end_layer":10}
1425{"start_layer":6}
1435{"layers":"7+"}
1437{"start_layer":7,"end_layer":10}
1450{"layers":[7,8,9,10]}
1477{"start_layer":7}
1485{"start_layer":7}
1489{"start_layer":7}
1492{"layers":"7+"}
1493{"layers":"7+"}
1499{"start_layer":7}
1515{"start_layer":7}
1521
1532{"layers":"7+"}
1534
1541{"start_layer":7,"new_scalar_params":66}
1570{"start_layer":7}
1578
1614{"start_layer":7}
1620
1635{"start_layer":7}
1647
1661{"start_layer":7}
1667{"start_layer":7}
1720{"start_layer":7}
1725
1731{"start_layer":7}
1733
1737{"start_layer":7}
1750
1755{"start_layer":7}
1760{"start_layer":7}