← Back to Architecture
other
ArchitectureUsed in
18 PRs
Best BPB
0.4961
Avg BPB
1.2050
Submissions
PR #895by iverbovoy
1.0889PR #895by iverbovoy
1.0889PR #928by autocode-rayes
1.1211PR #988by ymrohit
1.0857PR #1083by newjordan
0.4961PR #1152by ericdatum
1.7942PR #1323by sohv
1.1247PR #1411by Blakethefn
1.5568PR #1481by Cayton-Tech
1.3440PR #1490by wisebreadloaf
1.6110PR #1529by msisovic
1.0744PR #1574by KRGulaj
1.3587PR #1627by mike-ferguson
1.3246PR #1654by IshiPareek
1.2699PR #1701by Buld1n
1.1016PR #1703by Buld1n
1.0832PR #1703by Buld1n
1.0832PR #1703by Buld1n
1.0832Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 895 | {"learned_scales":true} |
| 895 | — |
| 928 | {"extra_params":22} |
| 988 | {"name":"LSWA-64x4","latent_channels":64,"workspace_slots":4,"heads":4,"think_steps":1,"active_from_block":5} |
| 1083 | {"inst_dim":32,"flow":true,"dn":0,"causality_fixed":true} |
| 1152 | {"neurons":300,"sensory":88,"motor":123,"hops":6,"density":0.0565} |
| 1323 | {"extra_params":152000,"bottleneck_dim":64} |
| 1411 | {"dimension":64} |
| 1481 | {"ranks_tested":[64,128]} |
| 1490 | {"layers":[1,3]} |
| 1529 | {"parallel_residual_start":8} |
| 1574 | {"d_state":34} |
| 1627 | {"layer_traversal_mode":"odds_then_evens"} |
| 1654 | — |
| 1701 | {"groups":16,"scale":0.35} |
| 1703 | {"loop_inject_enabled":1,"loop_inject_scale":1,"loop_inject_start_pass":1,"loop_inject_init":0.1} |
| 1703 | {"use_pass_readout":1,"readout_groups":16,"readout_scale":0.35} |
| 1703 | {"enable_parallel_residual_at_step":0,"parallel_residual_start":7} |