val_bpb
1.5283
Architecture
U-Net-style encoder/decoder with a recurrent shared layer
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
depth recurrence
Replaces some unique layers with a single shared recurrent layer applied multiple times to save parameters while increasing effective depth.
parameters: {"unique_layers":7,"recurrent_passes":3,"effective_depth":10}
weight tying
Shares weights across recurrent passes of the same layer.
parameters: {"recurrent_passes":3}
iteration embeddings
Uses learned per-pass embeddings so the recurrent layer is aware of which iteration it is on.
parameters: {"passes":3}
Regularization
residual scaling
parameters: {"scale":"1/sqrt(K)"}
Evaluation
test-time compute scaling
parameters: {"train_passes":3,"inference_passes":[6,8]}
Novel Contributions
- Depth recurrence using a shared recurrent layer applied K times to reduce parameter count.
- Learned iteration embeddings to distinguish recurrent passes.
- Residual scaling by 1/sqrt(K) for stability.
- Ability to increase K at inference time for better BPB without changing model size.
- U-Net-style encoder/decoder with skip connections combined with recurrent depth sharing.