← Back to Architecture

other

Architecture
Used in
18 PRs
Best BPB
0.4961
Avg BPB
1.2050

Hyperparameters Across PRs

pr_numberparameters
895{"learned_scales":true}
895
928{"extra_params":22}
988{"name":"LSWA-64x4","latent_channels":64,"workspace_slots":4,"heads":4,"think_steps":1,"active_from_block":5}
1083{"inst_dim":32,"flow":true,"dn":0,"causality_fixed":true}
1152{"neurons":300,"sensory":88,"motor":123,"hops":6,"density":0.0565}
1323{"extra_params":152000,"bottleneck_dim":64}
1411{"dimension":64}
1481{"ranks_tested":[64,128]}
1490{"layers":[1,3]}
1529{"parallel_residual_start":8}
1574{"d_state":34}
1627{"layer_traversal_mode":"odds_then_evens"}
1654
1701{"groups":16,"scale":0.35}
1703{"loop_inject_enabled":1,"loop_inject_scale":1,"loop_inject_start_pass":1,"loop_inject_init":0.1}
1703{"use_pass_readout":1,"readout_groups":16,"readout_scale":0.35}
1703{"enable_parallel_residual_at_step":0,"parallel_residual_start":7}