← Back to Architecture
Hybrid Attention + Mamba SSM
ArchitectureUsed in
1 PRs
Best BPB
1.1828
Avg BPB
1.1828
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 599 | {"layers":7,"attention_heads":8,"kv_heads":4,"ssm_state_size":8,"mlp_multiplier":4} |