PR #1387
openNon-record submission: Depth-Recurrent U-Net Transformer
by Muhammad-Ahmed-RayyanView on GitHub
val_bpb
1.2919
Architecture
Transformer
Optimizer
—
Artifact Size
14.43MB
Training Techniques
Architecture
depth recurrence
Reuses shared transformer blocks across depth instead of using independent layers.
parameters: {"num_recurrences":9}
weight tying
Shares weights across repeated encoder and decoder blocks.
parameters: {"shared_blocks":2}
U-Net skip connections
Preserves the baseline U-Net encoder-decoder skip structure with unique skip weights.
parameters: null
KV head count
Uses grouped key/value heads in the transformer configuration.
parameters: {"num_heads":16,"num_kv_heads":8}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Depth-recurrent U-Net transformer using weight-tied encoder and decoder blocks
- Application of the Universal Transformer idea to the challenge baseline
- Reallocation of parameter budget from depth diversity to wider representations
- Preservation of U-Net skip connections with shared recurrent blocks