← Back to Architecture

recursive weight sharing

Architecture
Used in
1 PRs
Best BPB
1.1355
Avg BPB
1.1355

Hyperparameters Across PRs

pr_numberparameters
579{"unique_blocks":6,"loops":2,"effective_depth":12,"MLP_expansion":"4x"}