← Back to Architecture

SmearGate + BigramHash

Architecture
Used in
2 PRs
Best BPB
1.1511
Avg BPB
1.1750

Hyperparameters Across PRs

pr_numberparameters
510{"BigramHash_size":10240,"BigramHash_dim":128,"layers":10,"hidden_dim":1536,"heads":8,"KV_heads":4}
538{"layers":10,"dimensions":512,"mlp_multiplier":3,"bigram_vocab_size":10240,"bigram_dim":128,"heads":8,"kv_heads":4}