← Back to Regularization
layerwise LN scale
RegularizationUsed in
124 PRs
Best BPB
0.0281
Avg BPB
1.0765
Submissions
PR #175by anthony-maio
1.1229PR #315by jfprincz
1.1248PR #325by Aum08Desai
1.1462PR #376by anthony-maio
1.1399PR #379by dannywillowliu-uchi
1.1257PR #383by joelnishanth
1.1320PR #388by ElliotSlusky
1.1231PR #389by trasnake87
1.1466PR #398by felipe-parodi
1.1213PR #400by chanwoo-park-official
1.1296PR #401by newjordan
1.1243PR #410by EthanYangTW
1.1216PR #414by signalrush
1.1233PR #415by EthanYangTW
1.1216PR #417by EthanYangTW
1.1227PR #418by yashverms
1.1715PR #421by vytautas-bunevicius
1.1466PR #442by sjp611
1.1027PR #445by newjordan
1.1236PR #450by zachgoldfine44
1.1466PR #453by Divyesh-Thirukonda
1.1248PR #455by kasimte
1.1299PR #461by Christopher-Lee-McClendon
1.1446PR #478by gowtham0992
1.1268PR #481by mrdavtan
1.0970PR #482by harsha-gouru
1.1522PR #485by harsha-gouru
1.1522PR #486by ndokutovich
1.1101PR #487by anantdgoel
1.1720PR #498by newjordan
1.1478PR #499by newjordan
1.1478PR #503by EthanYangTW
1.1195PR #507by skarakulak
1.1558PR #528by EthanYangTW
1.1195PR #529by EthanYangTW
1.1195PR #532by NotADevIAmaMeatPopsicle
1.0487PR #534by rarce
1.1804PR #545by EthanYangTW
1.1179PR #549by abaybektursunRECORD
1.1194PR #573by Sarimsaljook
1.0523PR #579by newjordan
1.1355PR #589by RoyiRa
1.1178PR #593by abaybektursun
1.1163PR #634by raahilshah
1.1171PR #638by Asukabot0
1.1164PR #642by minh-stakc
0.8173PR #648by maorinka
1.1428PR #657by anthony-maio
1.1234PR #659by deanbrr
1.0920PR #661by andrewbaggio1
1.1175PR #672by andrewbaggio1
1.0781PR #681by Alfaxad
1.4775PR #682by gthgomez
1.1233PR #688by RoyiRa
1.0745PR #695by 0xNoramiya
1.1360PR #698by hesong0222-dev
1.1642PR #703by Gusanidas
1.1176PR #710by Dhruba531
1.1240PR #714by Upsalla
1.1187PR #720by agalimova
1.1078PR #726by DeepReinforce
1.1147PR #728by abaybektursun
1.1142PR #733by stukenov
1.0278PR #738by gowtham0992
1.0970PR #741by andrewbaggio1
0.9850PR #752by Naazimsnh02
1.1182PR #763by hypery11
0.9917PR #770by minh-stakc
0.6672PR #771by sunnypatneedi
1.0705PR #774by travispchen
0.9370PR #779by deanbrr
0.6683PR #785by SirSaltySalmon
1.5364PR #786by shinegami-2002
0.8128PR #794by jeremyschied
1.3346PR #796by Robby955
0.6567PR #797by armantsaturian
0.8960PR #808by Naazimsnh02
0.6364PR #826by himanshudongre
0.2951PR #827by Programmerryoki
1.3999PR #832by jfprincz
1.1903PR #834by AnirudhRahul
0.1663PR #838by aryanbhosale
1.1215PR #841by someone114514
1.1157PR #855by aazizyan
1.2659PR #872by gowtham0992
1.0467PR #925by THUQiXuan
0.0281PR #991by ibarrajo
1.1145PR #1008by monkeyKingProgrammer
1.1538PR #1077by malc3om
1.1130PR #1096by vimeto
1.3342PR #1101by amrayach
1.1290PR #1231by nestamidavaine
1.1163PR #1274by MatoTeziTanka
1.0876PR #1319by canivel
0.6951PR #1413by dexhunterRECORD
1.0828PR #1467by PhamPhuHoa-23
1.1056PR #1492by bigbag
1.0810PR #1514by dexhunter
1.0798PR #1515by dexhunter
1.0872PR #1520by taka6745
1.0824PR #1536by dexhunter
1.0775PR #1539by translatingthename
1.0587PR #1541by bigbag
1.0778PR #1546by SPThole
1.0850PR #1555by andrewbaggio1
1.0764PR #1570by yufang67
1.0970PR #1586by dexhunter
1.0749PR #1602by SPThole
1.0744PR #1616by Vickyrrrrrr
1.4100PR #1630by KevinChunye
1.1412PR #1639by kunwar-vikrant
1.0832PR #1658by AVINASH0052
1.0810PR #1661by anderamondarainh-stack
1.1444PR #1667by MarioPaerle
1.0714PR #1670by dexhunter
1.0597PR #1676by aazizyan
1.0788PR #1688by Buld1n
1.0809PR #1689by chris-colinsky
1.0822PR #1693by dexhunter
1.0573PR #1696by kings-crown
1.1224PR #1714by Anakintano
1.0857PR #1715by G3sparky
1.0809PR #1720by kiyoaki
1.0818PR #1737by sakthivarshans
1.0723Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 175 | {"scale":"1/sqrt(i+1)"} |
| 315 | {"scale":"1/sqrt(layer_idx+1)"} |
| 325 | {"ln_scale":1} |
| 376 | {"formula":"1/sqrt(layer_idx+1)"} |
| 379 | {"scale_rule":"1/sqrt(layer_idx+1)"} |
| 383 | {"scale":"1/sqrt(layer_idx+1)"} |
| 388 | {"scale_factor":"1/sqrt(layer_idx+1)"} |
| 389 | {"scale":"1/sqrt(layer_idx+1)"} |
| 398 | {"scale":"1/sqrt(layer+1)"} |
| 400 | {"enabled":true} |
| 401 | {"scale_rule":"1/sqrt(layer_idx+1)"} |
| 410 | {"ln_scale":true} |
| 414 | {"scale":"1/sqrt(layer_idx+1)"} |
| 415 | {"ln_scale":true} |
| 417 | — |
| 418 | {"scale":"1/sqrt(layer+1)"} |
| 421 | — |
| 442 | — |
| 445 | — |
| 450 | {"scale":"1/sqrt(layer_idx+1)"} |
| 453 | {"scale":"1/sqrt(layer_idx+1)"} |
| 455 | {"scale_factor":"1/sqrt(layer_idx+1)"} |
| 461 | {"formula":"1/sqrt(layer+1)"} |
| 478 | {"scale_rule":"1/sqrt(layer_idx+1)"} |
| 481 | — |
| 482 | {"scale":"1/sqrt(layer+1)"} |
| 485 | {"scale":"1/sqrt(layer+1)"} |
| 486 | — |
| 487 | — |
| 498 | {"scale":"1/sqrt(layer_idx+1)"} |
| 499 | {"scale":"1/sqrt(layer_idx+1)"} |
| 503 | — |
| 507 | {"scale":"1/sqrt(layer_idx+1)","applied_to":"RMSNorm inputs"} |
| 528 | — |
| 529 | — |
| 532 | {"scale":"1/sqrt(layer+1)"} |
| 534 | {"scale_rule":"1/sqrt(i+1)"} |
| 545 | {"scale":"1/sqrt(layer+1)"} |
| 549 | {"formula":"1/sqrt(layer+1)"} |
| 573 | {"scale":"1/sqrt(layer_idx+1)"} |
| 579 | {"scale_factor":"1/sqrt(layer_idx+1)"} |
| 589 | {"scale":"1/sqrt(layer+1)"} |
| 593 | {"scale":"1/sqrt(layer+1)"} |
| 634 | {"scale_factor":"1/sqrt(layer_idx+1)"} |
| 638 | — |
| 642 | — |
| 648 | {"per_loop_scale_bias":true} |
| 657 | {"scale":"1/sqrt(i+1)"} |
| 659 | {"formula":"1/sqrt(layer+1)"} |
| 661 | {"scale":"1/sqrt(layer+1)"} |
| 672 | — |
| 681 | {"scale":"1/sqrt(layer+1)"} |
| 682 | {"scale":"1/sqrt(layer_idx+1)"} |
| 688 | {"formula":"1/sqrt(layer+1)"} |
| 695 | {"scale":"1/sqrt(layer_idx+1)"} |
| 698 | — |
| 703 | {"scale":"1/sqrt(layer+1)"} |
| 710 | {"scale_rule":"1/sqrt(layer_idx+1)"} |
| 714 | {"formula":"1/sqrt(layer+1)"} |
| 720 | {"scale":"1/sqrt(layer+1)"} |
| 726 | {"scale":"1/sqrt(layer+1)"} |
| 728 | {"scale":"1/sqrt(layer+1)"} |
| 733 | {"scale":"1/sqrt(i+1)"} |
| 738 | {"formula":"1/sqrt(layer+1)"} |
| 741 | — |
| 752 | {"scale":"1/sqrt(layer+1)"} |
| 763 | {"scale":"1/sqrt(layer+1)"} |
| 770 | — |
| 771 | {"scale":"1/sqrt(layer+1)"} |
| 774 | {"scale":"1/sqrt(layer+1)"} |
| 779 | — |
| 785 | {"enabled":1} |
| 786 | {"formula":"1/sqrt(layer+1)"} |
| 794 | {"formula":"1/sqrt(layer+1)"} |
| 796 | {"scale":"1/sqrt(layer+1)"} |
| 797 | {"formula":"1/sqrt(layer+1)"} |
| 808 | {"formula":"1/sqrt(layer+1)"} |
| 826 | — |
| 827 | {"scale":"1/sqrt(layer+1)"} |
| 832 | {"enabled":true} |
| 834 | — |
| 838 | {"scale":"1/sqrt(layer_idx+1)"} |
| 841 | — |
| 855 | — |
| 872 | {"formula":"1/sqrt(layer+1)"} |
| 925 | {"scale":"1/sqrt(layer+1)"} |
| 991 | — |
| 1008 | — |
| 1077 | {"scale":"1/sqrt(layer+1)"} |
| 1096 | {"description":"Output-LN / Peri-LN on shared blocks"} |
| 1101 | {"scale":"1/sqrt(layer_idx + 1)"} |
| 1231 | {"scale":"1/sqrt(layer+1)"} |
| 1274 | {"ln_scale":1} |
| 1319 | {"scale":"1/sqrt(layer+1)"} |
| 1413 | — |
| 1467 | {"formula":"1/sqrt(layer+1)"} |
| 1492 | — |
| 1514 | — |
| 1515 | — |
| 1520 | — |
| 1536 | — |
| 1539 | {"scale_rule":"1/sqrt(layer+1)"} |
| 1541 | — |
| 1546 | — |
| 1555 | — |
| 1570 | — |
| 1586 | — |
| 1602 | — |
| 1616 | — |
| 1630 | {"scale":"1/sqrt(layer+1)"} |
| 1639 | — |
| 1658 | {"scale":"1/sqrt(layer+1)"} |
| 1661 | — |
| 1667 | {"formula":"1/sqrt(layer_idx+1)"} |
| 1670 | — |
| 1676 | — |
| 1688 | — |
| 1689 | — |
| 1693 | — |
| 1696 | {"scale":"1/sqrt(layer+1)"} |
| 1714 | — |
| 1715 | — |
| 1720 | — |
| 1737 | — |