← Back to Architecture
SmearGate + BigramHash
ArchitectureUsed in
2 PRs
Best BPB
1.1511
Avg BPB
1.1750
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 510 | {"BigramHash_size":10240,"BigramHash_dim":128,"layers":10,"hidden_dim":1536,"heads":8,"KV_heads":4} |
| 538 | {"layers":10,"dimensions":512,"mlp_multiplier":3,"bigram_vocab_size":10240,"bigram_dim":128,"heads":8,"kv_heads":4} |