← Back to Architecture

Hybrid ETD Transformer

Architecture
Used in
1 PRs
Best BPB
1.1169
Avg BPB
1.1169

Hyperparameters Across PRs

pr_numberparameters
1828{"encoder_layers":3,"think_layers":3,"think_passes":3,"decoder_layers":4,"d_model":512,"heads":8,"kv_heads":4}