← Back to Architecture

Hybrid Attention + Mamba SSM

Architecture
Used in
1 PRs
Best BPB
1.1828
Avg BPB
1.1828

Hyperparameters Across PRs

pr_numberparameters
599{"layers":7,"attention_heads":8,"kv_heads":4,"ssm_state_size":8,"mlp_multiplier":4}