← Back to Architecture

learned mixer head

Architecture
Used in
1 PRs
Best BPB
0.1582
Avg BPB
0.1582

Hyperparameters Across PRs

pr_numberparameters
859{"input_dim":512,"output_dim":7}