PR #1387

open

Non-record submission: Depth-Recurrent U-Net Transformer

by Muhammad-Ahmed-RayyanView on GitHub

val_bpb

1.2919

Architecture

Transformer

Optimizer

—

Artifact Size

14.43MB

Training Techniques

Architecture

depth recurrence

Reuses shared transformer blocks across depth instead of using independent layers.

parameters: {"num_recurrences":9}

weight tying

Shares weights across repeated encoder and decoder blocks.

parameters: {"shared_blocks":2}

U-Net skip connections

Preserves the baseline U-Net encoder-decoder skip structure with unique skip weights.

parameters: null

KV head count

Uses grouped key/value heads in the transformer configuration.

parameters: {"num_heads":16,"num_kv_heads":8}

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null