← Back to Architecture

Gated DeltaNet

Architecture
Used in
8 PRs
Best BPB
1.0030
Avg BPB
1.1246

Hyperparameters Across PRs

pr_numberparameters
1370{"layers":10,"dim":512,"heads":1,"expand_k":1,"expand_v":2}
1371{"layers":9,"gdn_layers":7,"attention_layers":2,"ratio":"3:1","head_dim_ratio":0.75,"expand_v":2,"conv_size":4}
1479{"layers":8}
1553{"layers":10,"head_dim":64,"expand_v":1,"use_short_conv":true}
1562{"layers":5}
1563
1632{"layers":10}
1749{"layers":10}