← Back to Architecture

hierarchical token processing

Architecture
Used in
1 PRs
Best BPB
0.6846
Avg BPB
0.6846

Hyperparameters Across PRs

pr_numberparameters
1121{"merge_factor":2,"encoder_layers":5,"decoder_layers":6,"d_model":512}