← Back to Architecture
Mamba
ArchitectureUsed in
10 PRs
Best BPB
1.1470
Avg BPB
1.3068
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 914 | — |
| 1107 | {"layers":8,"mamba_layers":7,"attention_layers":1,"dim":512,"d_state":64,"mlp_mult":3,"seq_len":4096} |
| 1245 | — |
| 1342 | {"layers":12,"d_model":512,"d_inner":1024,"d_state":64,"d_conv":4,"headdim":64} |
| 1355 | {"layers":7,"dim":512,"d_state":64,"seq_len":4096} |
| 1524 | {"layer":5,"d_model":512,"d_state":64,"d_conv":4,"expand":2} |
| 1525 | {"layer":5,"d_model":512,"d_state":64,"d_conv":4,"expand":2} |
| 1574 | {"d_model":640,"d_inner":1280,"d_state":34,"d_conv":4,"num_layers":8,"head_adapter_rank":16,"vocab_size":1056} |
| 1643 | {"layers":7,"attn_layers":2,"dim":512,"d_state":64,"expand":2,"headdim":64,"chunk_size":64,"mlp_mult":3} |
| 1757 | {"outer_layers":1,"layers":9,"encoder_layers":1,"main_layers":7,"decoder_layers":1,"kv_heads":4,"heads":8,"dim":512} |