← Back to Architecture

multi-model single representation

Architecture
Used in
1 PRs
Best BPB
1.2450
Avg BPB
1.2450

Hyperparameters Across PRs

pr_numberparameters
1352{"num_models":3,"model_types":["transformer","mlp","causal_depthwise"],"model_dims":[468,198,186],"shared_representation_dim":852,"cross_attention_dim":480}