← Back to Architecture
Tensor-Train attention
ArchitectureUsed in
1 PRs
Best BPB
1.1691
Avg BPB
1.1691
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1888 | {"layers":13,"d_model":512,"rank":8,"mode_shape":[8,8,8]} |