← Back to Architecture

logit bias

Architecture
Used in
1 PRs
Best BPB
0.9300
Avg BPB
0.9300

Hyperparameters Across PRs

pr_numberparameters
1229{"shape":"[bsz,1,vocab]"}