PR #54

open

RQZ-Golf v1: Depth recurrence for parameter efficiency

by TheCauseView on GitHub
val_bpb
1.5283
Architecture
U-Net-style encoder/decoder with a recurrent shared layer
Optimizer
Artifact Size

Training Techniques

Architecture
depth recurrence
Replaces some unique layers with a single shared recurrent layer applied multiple times to save parameters while increasing effective depth.
parameters: {"unique_layers":7,"recurrent_passes":3,"effective_depth":10}
weight tying
Shares weights across recurrent passes of the same layer.
parameters: {"recurrent_passes":3}
iteration embeddings
Uses learned per-pass embeddings so the recurrent layer is aware of which iteration it is on.
parameters: {"passes":3}
Regularization
residual scaling
parameters: {"scale":"1/sqrt(K)"}
Evaluation
test-time compute scaling
parameters: {"train_passes":3,"inference_passes":[6,8]}

Novel Contributions

  • Depth recurrence using a shared recurrent layer applied K times to reduce parameter count.
  • Learned iteration embeddings to distinguish recurrent passes.
  • Residual scaling by 1/sqrt(K) for stability.
  • Ability to increase K at inference time for better BPB without changing model size.
  • U-Net-style encoder/decoder with skip connections combined with recurrent depth sharing.