← Back to Architecture
Mamba
ArchitectureUsed in
13 PRs
Best BPB
1.1470
Avg BPB
1.2857
Submissions
PR #914by mkenney2
1.1873PR #1107by mradassaad
1.5633PR #1245by mkenney2
1.1470PR #1342by nicholasbailey87
1.4816PR #1355by mradassaad
1.1526PR #1524by Jash-Vora
1.2552PR #1525by Jash-Vora
1.2552PR #1574by KRGulaj
1.3587PR #1643by mradassaad
1.1473PR #1757by aiejvn
1.5194PR #1994by potatonyliu
1.3004PR #2070by vardanbobo007
1.1730PR #2073by vardanbobo007
1.1726Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 914 | — |
| 1107 | {"layers":8,"mamba_layers":7,"attention_layers":1,"dim":512,"d_state":64,"mlp_mult":3,"seq_len":4096} |
| 1245 | — |
| 1342 | {"layers":12,"d_model":512,"d_inner":1024,"d_state":64,"d_conv":4,"headdim":64} |
| 1355 | {"layers":7,"dim":512,"d_state":64,"seq_len":4096} |
| 1524 | {"layer":5,"d_model":512,"d_state":64,"d_conv":4,"expand":2} |
| 1525 | {"layer":5,"d_model":512,"d_state":64,"d_conv":4,"expand":2} |
| 1574 | {"d_model":640,"d_inner":1280,"d_state":34,"d_conv":4,"num_layers":8,"head_adapter_rank":16,"vocab_size":1056} |
| 1643 | {"layers":7,"attn_layers":2,"dim":512,"d_state":64,"expand":2,"headdim":64,"chunk_size":64,"mlp_mult":3} |
| 1757 | {"outer_layers":1,"layers":9,"encoder_layers":1,"main_layers":7,"decoder_layers":1,"kv_heads":4,"heads":8,"dim":512} |
| 1994 | {"d_state":64,"expand":2,"chunk_size":64,"headdim":64} |
| 2070 | {"d_state":128,"d_conv":4,"expand":2,"head_dim":64} |
| 2073 | {"d_state":128,"d_conv":4,"expand":2,"head_dim":64} |