← Back to Architecture
Hybrid
ArchitectureUsed in
18 PRs
Best BPB
0.5755
Avg BPB
1.3042
Submissions
PR #992by TimS-ml
1.4054PR #998by asuramaya
0.5755PR #1007by dillon-blake
1.2252PR #1044by greqone
1.8989PR #1061by rolandnsharp
1.3379PR #1142by ymrohit
1.1493PR #1198by ymrohit
1.5992PR #1346by shasank0001
1.2283PR #1491by wisebreadloaf
1.6924PR #1644by mradassaad
1.1473PR #1665by mrbese
1.3571PR #1685by butbutt42
1.7622PR #1829by estesryan
1.2047PR #1890by mradassaad
1.1456PR #1994by potatonyliu
1.3004PR #2066by SarooshKhan897
1.1005PR #2070by vardanbobo007
1.1730PR #2073by vardanbobo007
1.1726Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 992 | — |
| 998 | — |
| 1007 | {"layers":11,"attention_layers":3,"token_shift_layers":8} |
| 1044 | {"layers":9,"d_model":512,"heads":8,"kv_heads":4,"chunk_ratio":0.25} |
| 1061 | {"oscillators":192,"layers":12,"heads":16} |
| 1142 | {"lfb_layers":6,"lfb_dim":80,"lfb_bigram_vocab_size":2048} |
| 1198 | {"diffusion_num_steps":8,"diffusion_block_min":24,"diffusion_block_max":128,"diffusion_min_mask_frac":0.1,"diffusion_max_mask_frac":0.6,"diffusion_block_start_min_frac":0.25,"diffusion_block_start_max_frac":0.9,"diffusion_time_scale":0.05,"diffusion_refine_last_n":5,"diffusion_batch_shared_block":1} |
| 1346 | {"layers":15,"mlp_only":true,"tail_block":"blocks.14.mlp"} |
| 1491 | {"layers":4,"model_dim":256,"num_heads":4,"num_kv_heads":4,"num_streams":4,"num_fracs":1} |
| 1644 | {"layers":7,"ssm_blocks":5,"attention_layers":2,"dim":512,"d_state":64,"expand":2,"headdim":64,"chunk_size":64,"mlp_mult":3} |
| 1665 | {"layers":8,"mamba_blocks":6,"attention_blocks":2,"attention_positions":[2,5],"dim":512,"d_state":128,"ngroups":1,"expand":2} |
| 1685 | {"layers":9,"dimensions":384,"heads":6,"predictor_mlp_layers":2} |
| 1829 | {"layers":8,"model_dim":512,"num_heads":8,"num_kv_heads":4,"ssm_layers":7,"ssm_state_dim":128,"ssm_num_groups":8,"no_attn_layers":[3,6]} |
| 1890 | {"layers":7,"ssm_blocks":5,"attention_layers":2,"dim":512,"d_state":64,"expand":2,"headdim":64,"chunk_size":64,"mlp_mult":3} |
| 1994 | — |
| 2066 | {"layers":10,"attention_layers":[6],"embed_dim":512,"d_state":64,"headdim":64} |
| 2070 | {"layers":["nT","M","M","T","M","M","T","M","T","nT"]} |
| 2073 | {"layers":["nT","M","M","T","M","M","T","M","T","nT"]} |