← Back to Architecture
Transformer
ArchitectureUsed in
8 PRs
Best BPB
1.0717
Avg BPB
1.2851
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 284 | {"layers":10} |
| 298 | {"dim":768} |
| 724 | {"layers":10,"dimensions":512,"gqa":"8/4","bigramhash_buckets":10240} |
| 985 | {"dimensions":800,"layers":6,"heads":10} |
| 1116 | {"layers":11} |
| 1167 | {"layers":10} |
| 1357 | {"layers":12,"model_dim":512,"attention_heads":8,"kv_heads":4,"mlp_multiplier":3,"mlp_hidden":1536,"rope_dims":"16/64","vocab_size":1024,"bigram_buckets":1536} |
| 1505 | {"layers":11} |