← Back to Architecture

GatedDeltaNet / Flash Linear Attention

Architecture
Used in
1 PRs
Best BPB
1.0339
Avg BPB
1.0339

Hyperparameters Across PRs

pr_numberparameters
1705{"layers":10,"model_dim":544}