← Back to Initialization
depth-aware constant scale init
InitializationUsed in
1 PRs
Best BPB
1.1920
Avg BPB
1.1920
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 1496 | {"attn_scale":{"early":1,"mid":1.75,"late":2.5},"mlp_scale":{"early":1,"mid":1.15,"late":1.3}} |