← Back to Architecture

BigramLogitHead

Architecture
Used in
3 PRs
Best BPB
1.1522
Avg BPB
1.1522

Hyperparameters Across PRs

pr_numberparameters
477{"size":"1024x1024"}
482{"size":"1024x1024"}
485{"vocab_size":1024,"clipping":"[-4, 4]","smoothing_alpha":0.25}