← Back to Architecture
GatedDeltaNet
ArchitectureUsed in
11 PRs
Best BPB
1.0098
Avg BPB
1.1121
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 939 | {"heads":4,"head_dim":128} |
| 939 | — |
| 969 | {"layers":12,"dimensions":384,"chunk_size":64} |
| 970 | {"layers":12,"dimensions":384,"head_dim":64,"heads_per_layer":6,"chunk_size":64} |
| 1544 | {"layers":10,"head_dim":64,"use_short_conv":true} |
| 1545 | {"layers":10,"head_dim":64,"use_short_conv":true} |
| 1672 | {"layers":5,"dim":512,"heads":8,"head_dim":64} |
| 1687 | {"kv_sharing_stride":2,"num_swa_layers":0} |
| 1698 | {"layers":10,"dimensions":544,"heads":8,"kv_sharing_stride":2} |
| 1711 | {"layers":10,"dimensions":544,"heads":8} |
| 1712 | {"layers":10,"dimensions":544,"heads":8} |