← Back to Architecture

DeltaNet

Architecture
Used in
4 PRs
Best BPB
0.7614
Avg BPB
0.8876

Hyperparameters Across PRs

pr_numberparameters
990{"heads":4}
1028{"heads":4,"short_conv":true,"loops":4,"flat_layers":4,"crawler_layers":1}
1047{"heads":4}
1286{"layers":8,"final_attention_layer":1,"n_embd":384}