← Back to Architecture
GatedDeltaNet
ArchitectureUsed in
12 PRs
Best BPB
1.0098
Avg BPB
1.1056
Submissions
PR #939by brian386
1.2519PR #939by brian386
1.2519PR #969by dnldsz
1.2907PR #970by dnldsz
1.2907PR #1544by Abhishek8108
1.0283PR #1545by Abhishek8108
1.0283PR #1672by andrewbaggio1
1.0119PR #1687by resouer
1.0409PR #1698by arsenis-cmd
1.0099PR #1711by aamodbhatt
1.0098PR #1712by aamodbhatt
1.0190PR #1791by genji0306
1.0339Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 939 | {"heads":4,"head_dim":128} |
| 939 | — |
| 969 | {"layers":12,"dimensions":384,"chunk_size":64} |
| 970 | {"layers":12,"dimensions":384,"head_dim":64,"heads_per_layer":6,"chunk_size":64} |
| 1544 | {"layers":10,"head_dim":64,"use_short_conv":true} |
| 1545 | {"layers":10,"head_dim":64,"use_short_conv":true} |
| 1672 | {"layers":5,"dim":512,"heads":8,"head_dim":64} |
| 1687 | {"kv_sharing_stride":2,"num_swa_layers":0} |
| 1698 | {"layers":10,"dimensions":544,"heads":8,"kv_sharing_stride":2} |
| 1711 | {"layers":10,"dimensions":544,"heads":8} |
| 1712 | {"layers":10,"dimensions":544,"heads":8} |
| 1791 | {"layers":10,"model_dim":544,"heads":8,"head_dim":64,"kv_sharing_stride":2} |