← Back to Architecture

Tensor-Train attention

Architecture
Used in
1 PRs
Best BPB
1.1691
Avg BPB
1.1691

Hyperparameters Across PRs

pr_numberparameters
1888{"layers":13,"d_model":512,"rank":8,"mode_shape":[8,8,8]}