← Back to Architecture

Factored tied embedding

Architecture
Used in
2 PRs
Best BPB
1.1239
Avg BPB
1.1404

Hyperparameters Across PRs

pr_numberparameters
640
641{"vocab_size":8192,"embedding_dim":254}