← Back to Architecture
Partial RoPE, XSA, BigramHash, VE128, SmearGate, logit softcap, tied embeddings
ArchitectureUsed in
1 PRs
Best BPB
1.1215
Avg BPB
1.1215
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 578 | {"layers":11,"dimension":512,"heads":8,"kv_heads":4,"mlp_expansion":3,"bigram_hash_buckets":2048,"ve_layers":[9,10],"logit_softcap":30} |