← Back to Architecture

Partial RoPE, XSA, BigramHash, VE128, SmearGate, logit softcap, tied embeddings

Architecture
Used in
1 PRs
Best BPB
1.1215
Avg BPB
1.1215

Hyperparameters Across PRs

pr_numberparameters
578{"layers":11,"dimension":512,"heads":8,"kv_heads":4,"mlp_expansion":3,"bigram_hash_buckets":2048,"ve_layers":[9,10],"logit_softcap":30}