PR #1577
openSubmission: aria-redefine-qbit - Hybrid Recurrent U-Net (1.40 BPB
by redefine-qbitView on GitHub
val_bpb
1.4016
Architecture
Hybrid
Optimizer
Muon
Artifact Size
16.53 MB
Training Techniques
Architecture
depth recurrence
Decoder reuses shared weights in a looping mechanism to increase virtual depth without increasing parameter count.
parameters: {"recurrent_loops":1}
U-Net skip connections
Encoder outputs are cached and fused into decoder layers via weighted skip connections.
parameters: null
MLP3x
MLP width is scaled beyond the baseline using an MLP_MULT factor.
parameters: {"mlp_mult":1.9}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"newton_schulz_k":5,"torch_compile":true}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Hybrid recurrent U-Net transformer architecture
- Weight-shared decoder recurrence for virtual depth expansion
- U-Net skip connections for preserving high-frequency token information
- Entropy-aware MLP scaling at 1.9x
- Muon optimization with Newton-Schulz kernels compiled with torch.compile