← Back to Architecture
other
ArchitectureUsed in
25 PRs
Best BPB
0.4961
Avg BPB
1.2067
Submissions
PR #895by iverbovoy
1.0889PR #895by iverbovoy
1.0889PR #928by autocode-rayes
1.1211PR #988by ymrohit
1.0857PR #1083by newjordan
0.4961PR #1152by ericdatum
1.7942PR #1323by sohv
1.1247PR #1411by Blakethefn
1.5568PR #1481by Cayton-Tech
1.3440PR #1490by wisebreadloaf
1.6110PR #1529by msisovicRECORD
1.0744PR #1574by KRGulaj
1.3587PR #1627by mike-ferguson
1.3246PR #1654by IshiPareek
1.2699PR #1701by Buld1n
1.1016PR #1703by Buld1n
1.0832PR #1703by Buld1n
1.0832PR #1703by Buld1n
1.0832PR #1821by anjing00monyet-arch
1.3825PR #1894by ChideraIbe123
1.0996PR #1957by mhlov000111
1.2313PR #2004by corbensorenson
1.3569PR #2010by Abhishek8108
1.0817PR #2017by Armigerous
1.0611PR #2056by FF-GardenFn
1.2646Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 895 | {"learned_scales":true} |
| 895 | — |
| 928 | {"extra_params":22} |
| 988 | {"name":"LSWA-64x4","latent_channels":64,"workspace_slots":4,"heads":4,"think_steps":1,"active_from_block":5} |
| 1083 | {"inst_dim":32,"flow":true,"dn":0,"causality_fixed":true} |
| 1152 | {"neurons":300,"sensory":88,"motor":123,"hops":6,"density":0.0565} |
| 1323 | {"extra_params":152000,"bottleneck_dim":64} |
| 1411 | {"dimension":64} |
| 1481 | {"ranks_tested":[64,128]} |
| 1490 | {"layers":[1,3]} |
| 1529 | {"parallel_residual_start":8} |
| 1574 | {"d_state":34} |
| 1627 | {"layer_traversal_mode":"odds_then_evens"} |
| 1654 | — |
| 1701 | {"groups":16,"scale":0.35} |
| 1703 | {"loop_inject_enabled":1,"loop_inject_scale":1,"loop_inject_start_pass":1,"loop_inject_init":0.1} |
| 1703 | {"use_pass_readout":1,"readout_groups":16,"readout_scale":0.35} |
| 1703 | {"enable_parallel_residual_at_step":0,"parallel_residual_start":7} |
| 1821 | {"tiers":4,"top_fp16":256,"int8_range":[256,2047],"int6_range":[2048,16383],"int4_range":[16384,57600]} |
| 1894 | {"method":"RECUR_AB"} |
| 1957 | — |
| 2004 | {"layers":["input","loop_first"],"experts":16,"rank":2} |
| 2010 | {"token_order":16,"word_order":4} |
| 2017 | {"vocab_size":8192,"d_model":512} |
| 2056 | {"gate_type":"cummax-based within-chunk gate"} |