← Back to Architecture
Byte-level transformer
ArchitectureUsed in
1 PRs
Best BPB
1.1903
Avg BPB
1.1903
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 832 | {"vocab_size":260,"layers":13,"dim":512,"num_heads":8,"num_kv_heads":4} |