val_bpb
1.6004
Architecture
shared-loop recurrent transformer
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
depth recurrence
Uses a shared-loop recurrent transformer with looped layers to reuse the same block multiple times.
parameters: {"model_dim":512,"num_loop_iters":3,"min_loop_iters":1}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Other
other
Non-record submission targeting the 10-minute 16MB track with a compact recurrent architecture and stable convergence.
parameters: {"iterations":6000,"hardware":"8x H100","runtime_seconds":224}
Novel Contributions
- Shared-loop recurrent transformer architecture
- Compact 512-dimensional model for the 10-minute 16MB track
- Stable convergence within the runtime constraint
- Uses looped layers with recurrent depth sharing