PR #1387

open

Non-record submission: Depth-Recurrent U-Net Transformer

by Muhammad-Ahmed-RayyanView on GitHub
val_bpb
1.2919
Architecture
Transformer
Optimizer
Artifact Size
14.43MB

Training Techniques

Architecture
depth recurrence
Reuses shared transformer blocks across depth instead of using independent layers.
parameters: {"num_recurrences":9}
weight tying
Shares weights across repeated encoder and decoder blocks.
parameters: {"shared_blocks":2}
U-Net skip connections
Preserves the baseline U-Net encoder-decoder skip structure with unique skip weights.
parameters: null
KV head count
Uses grouped key/value heads in the transformer configuration.
parameters: {"num_heads":16,"num_kv_heads":8}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null

Novel Contributions

  • Depth-recurrent U-Net transformer using weight-tied encoder and decoder blocks
  • Application of the Universal Transformer idea to the challenge baseline
  • Reallocation of parameter budget from depth diversity to wider representations
  • Preservation of U-Net skip connections with shared recurrent blocks