← Back to Architecture

depth recurrence / looped transformer

Architecture
Used in
1 PRs
Best BPB
1.1462
Avg BPB
1.1462

Hyperparameters Across PRs

pr_numberparameters
325{"num_layers":6,"loop_core_layers":2,"loop_repeats":5,"loop_attn_every":2,"effective_executed_layers":14}