← Back to Architecture

MLP3

Architecture
Used in
1 PRs
Best BPB
1.3967
Avg BPB
1.3967

Hyperparameters Across PRs

pr_numberparameters
2113{"layers":4,"dimensions":416,"kv":1,"vocab_size":2050}