PR #54

open

RQZ-Golf v1: Depth recurrence for parameter efficiency

val_bpb

1.5283

Architecture

U-Net-style encoder/decoder with a recurrent shared layer

Optimizer

—

Artifact Size

—

Training Techniques

Architecture

depth recurrence

Replaces some unique layers with a single shared recurrent layer applied multiple times to save parameters while increasing effective depth.

parameters: {"unique_layers":7,"recurrent_passes":3,"effective_depth":10}

weight tying

Shares weights across recurrent passes of the same layer.

parameters: {"recurrent_passes":3}

iteration embeddings

Uses learned per-pass embeddings so the recurrent layer is aware of which iteration it is on.

parameters: {"passes":3}

Regularization

residual scaling

parameters: {"scale":"1/sqrt(K)"}

Evaluation

test-time compute scaling

parameters: {"train_passes":3,"inference_passes":[6,8]}

Depth recurrence using a shared recurrent layer applied K times to reduce parameter count.
Learned iteration embeddings to distinguish recurrent passes.
Residual scaling by 1/sqrt(K) for stability.
Ability to increase K at inference time for better BPB without changing model size.
U-Net-style encoder/decoder with skip connections combined with recurrent depth sharing.