← Back to Architecture

layerwise LN scale

Architecture
Used in
3 PRs
Best BPB
1.0752
Avg BPB
1.1174

Hyperparameters Across PRs

pr_numberparameters
492
585
1584