← Back to Architecture

Gated DeltaNet hybrid

Architecture
Used in
1 PRs
Best BPB
1.0171
Avg BPB
1.0171

Hyperparameters Across PRs

pr_numberparameters
1564{"layers_pattern":"[GDN×5] → SWA → [GDN×5] → SWA_shared","tokenizer":"SP1024"}