← Back to Architecture

Byte-level transformer

Architecture
Used in
1 PRs
Best BPB
1.1903
Avg BPB
1.1903

Hyperparameters Across PRs

pr_numberparameters
832{"vocab_size":260,"layers":13,"dim":512,"num_heads":8,"num_kv_heads":4}