← Back to Architecture
depth recurrence / looped transformer
ArchitectureUsed in
1 PRs
Best BPB
1.1462
Avg BPB
1.1462
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 325 | {"num_layers":6,"loop_core_layers":2,"loop_repeats":5,"loop_attn_every":2,"effective_executed_layers":14} |