← Back to Architecture
depth recurrence, weight tying, tied embeddings, RoPE, ReLU² MLP 3×, GQA
ArchitectureUsed in
1 PRs
Best BPB
1.1750
Avg BPB
1.1750
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 575 | {"layers":6,"loop_blocks":2,"loop_iters":3,"embed_dim":512,"num_heads":8,"num_kv_heads":8,"mlp_expansion":3} |