← Back to Architecture

GatedDeltaNet

Architecture
Used in
11 PRs
Best BPB
1.0098
Avg BPB
1.1121

Hyperparameters Across PRs

pr_numberparameters
939{"heads":4,"head_dim":128}
939
969{"layers":12,"dimensions":384,"chunk_size":64}
970{"layers":12,"dimensions":384,"head_dim":64,"heads_per_layer":6,"chunk_size":64}
1544{"layers":10,"head_dim":64,"use_short_conv":true}
1545{"layers":10,"head_dim":64,"use_short_conv":true}
1672{"layers":5,"dim":512,"heads":8,"head_dim":64}
1687{"kv_sharing_stride":2,"num_swa_layers":0}
1698{"layers":10,"dimensions":544,"heads":8,"kv_sharing_stride":2}
1711{"layers":10,"dimensions":544,"heads":8}
1712{"layers":10,"dimensions":544,"heads":8}