← Back to Architecture

shared sparse sidecar

Architecture
Used in
1 PRs
Best BPB
1.0916
Avg BPB
1.0916

Hyperparameters Across PRs

pr_numberparameters
555{"start_layer":8,"hidden_dim":48}