← Back to Regularization

layerwise LN scale

Regularization
Used in
124 PRs
Best BPB
0.0281
Avg BPB
1.0765

Submissions

PR #175by anthony-maio
1.1229
PR #315by jfprincz
1.1248
PR #325by Aum08Desai
1.1462
PR #376by anthony-maio
1.1399
PR #379by dannywillowliu-uchi
1.1257
PR #383by joelnishanth
1.1320
PR #388by ElliotSlusky
1.1231
PR #389by trasnake87
1.1466
PR #398by felipe-parodi
1.1213
PR #400by chanwoo-park-official
1.1296
PR #401by newjordan
1.1243
PR #410by EthanYangTW
1.1216
PR #414by signalrush
1.1233
PR #415by EthanYangTW
1.1216
PR #417by EthanYangTW
1.1227
PR #418by yashverms
1.1715
PR #421by vytautas-bunevicius
1.1466
PR #442by sjp611
1.1027
PR #445by newjordan
1.1236
PR #450by zachgoldfine44
1.1466
PR #453by Divyesh-Thirukonda
1.1248
PR #455by kasimte
1.1299
PR #461by Christopher-Lee-McClendon
1.1446
PR #478by gowtham0992
1.1268
PR #481by mrdavtan
1.0970
PR #482by harsha-gouru
1.1522
PR #485by harsha-gouru
1.1522
PR #486by ndokutovich
1.1101
PR #487by anantdgoel
1.1720
PR #498by newjordan
1.1478
PR #499by newjordan
1.1478
PR #503by EthanYangTW
1.1195
PR #507by skarakulak
1.1558
PR #528by EthanYangTW
1.1195
PR #529by EthanYangTW
1.1195
PR #532by NotADevIAmaMeatPopsicle
1.0487
PR #534by rarce
1.1804
PR #545by EthanYangTW
1.1179
PR #549by abaybektursunRECORD
1.1194
PR #573by Sarimsaljook
1.0523
PR #579by newjordan
1.1355
PR #589by RoyiRa
1.1178
PR #593by abaybektursun
1.1163
PR #634by raahilshah
1.1171
PR #638by Asukabot0
1.1164
PR #642by minh-stakc
0.8173
PR #648by maorinka
1.1428
PR #657by anthony-maio
1.1234
PR #659by deanbrr
1.0920
PR #661by andrewbaggio1
1.1175
PR #672by andrewbaggio1
1.0781
PR #681by Alfaxad
1.4775
PR #682by gthgomez
1.1233
PR #688by RoyiRa
1.0745
PR #695by 0xNoramiya
1.1360
PR #698by hesong0222-dev
1.1642
PR #703by Gusanidas
1.1176
PR #710by Dhruba531
1.1240
PR #714by Upsalla
1.1187
PR #720by agalimova
1.1078
PR #726by DeepReinforce
1.1147
PR #728by abaybektursun
1.1142
PR #733by stukenov
1.0278
PR #738by gowtham0992
1.0970
PR #741by andrewbaggio1
0.9850
PR #752by Naazimsnh02
1.1182
PR #763by hypery11
0.9917
PR #770by minh-stakc
0.6672
PR #771by sunnypatneedi
1.0705
PR #774by travispchen
0.9370
PR #779by deanbrr
0.6683
PR #785by SirSaltySalmon
1.5364
PR #786by shinegami-2002
0.8128
PR #794by jeremyschied
1.3346
PR #796by Robby955
0.6567
PR #797by armantsaturian
0.8960
PR #808by Naazimsnh02
0.6364
PR #826by himanshudongre
0.2951
PR #827by Programmerryoki
1.3999
PR #832by jfprincz
1.1903
PR #834by AnirudhRahul
0.1663
PR #838by aryanbhosale
1.1215
PR #841by someone114514
1.1157
PR #855by aazizyan
1.2659
PR #872by gowtham0992
1.0467
PR #925by THUQiXuan
0.0281
PR #991by ibarrajo
1.1145
PR #1008by monkeyKingProgrammer
1.1538
PR #1077by malc3om
1.1130
PR #1096by vimeto
1.3342
PR #1101by amrayach
1.1290
PR #1231by nestamidavaine
1.1163
PR #1274by MatoTeziTanka
1.0876
PR #1319by canivel
0.6951
PR #1413by dexhunterRECORD
1.0828
PR #1467by PhamPhuHoa-23
1.1056
PR #1492by bigbag
1.0810
PR #1514by dexhunter
1.0798
PR #1515by dexhunter
1.0872
PR #1520by taka6745
1.0824
PR #1536by dexhunter
1.0775
PR #1539by translatingthename
1.0587
PR #1541by bigbag
1.0778
PR #1546by SPThole
1.0850
PR #1555by andrewbaggio1
1.0764
PR #1570by yufang67
1.0970
PR #1586by dexhunter
1.0749
PR #1602by SPThole
1.0744
PR #1616by Vickyrrrrrr
1.4100
PR #1630by KevinChunye
1.1412
PR #1639by kunwar-vikrant
1.0832
PR #1658by AVINASH0052
1.0810
PR #1661by anderamondarainh-stack
1.1444
PR #1667by MarioPaerle
1.0714
PR #1670by dexhunter
1.0597
PR #1676by aazizyan
1.0788
PR #1688by Buld1n
1.0809
PR #1689by chris-colinsky
1.0822
PR #1693by dexhunter
1.0573
PR #1696by kings-crown
1.1224
PR #1714by Anakintano
1.0857
PR #1715by G3sparky
1.0809
PR #1720by kiyoaki
1.0818
PR #1737by sakthivarshans
1.0723

Hyperparameters Across PRs

pr_numberparameters
175{"scale":"1/sqrt(i+1)"}
315{"scale":"1/sqrt(layer_idx+1)"}
325{"ln_scale":1}
376{"formula":"1/sqrt(layer_idx+1)"}
379{"scale_rule":"1/sqrt(layer_idx+1)"}
383{"scale":"1/sqrt(layer_idx+1)"}
388{"scale_factor":"1/sqrt(layer_idx+1)"}
389{"scale":"1/sqrt(layer_idx+1)"}
398{"scale":"1/sqrt(layer+1)"}
400{"enabled":true}
401{"scale_rule":"1/sqrt(layer_idx+1)"}
410{"ln_scale":true}
414{"scale":"1/sqrt(layer_idx+1)"}
415{"ln_scale":true}
417
418{"scale":"1/sqrt(layer+1)"}
421
442
445
450{"scale":"1/sqrt(layer_idx+1)"}
453{"scale":"1/sqrt(layer_idx+1)"}
455{"scale_factor":"1/sqrt(layer_idx+1)"}
461{"formula":"1/sqrt(layer+1)"}
478{"scale_rule":"1/sqrt(layer_idx+1)"}
481
482{"scale":"1/sqrt(layer+1)"}
485{"scale":"1/sqrt(layer+1)"}
486
487
498{"scale":"1/sqrt(layer_idx+1)"}
499{"scale":"1/sqrt(layer_idx+1)"}
503
507{"scale":"1/sqrt(layer_idx+1)","applied_to":"RMSNorm inputs"}
528
529
532{"scale":"1/sqrt(layer+1)"}
534{"scale_rule":"1/sqrt(i+1)"}
545{"scale":"1/sqrt(layer+1)"}
549{"formula":"1/sqrt(layer+1)"}
573{"scale":"1/sqrt(layer_idx+1)"}
579{"scale_factor":"1/sqrt(layer_idx+1)"}
589{"scale":"1/sqrt(layer+1)"}
593{"scale":"1/sqrt(layer+1)"}
634{"scale_factor":"1/sqrt(layer_idx+1)"}
638
642
648{"per_loop_scale_bias":true}
657{"scale":"1/sqrt(i+1)"}
659{"formula":"1/sqrt(layer+1)"}
661{"scale":"1/sqrt(layer+1)"}
672
681{"scale":"1/sqrt(layer+1)"}
682{"scale":"1/sqrt(layer_idx+1)"}
688{"formula":"1/sqrt(layer+1)"}
695{"scale":"1/sqrt(layer_idx+1)"}
698
703{"scale":"1/sqrt(layer+1)"}
710{"scale_rule":"1/sqrt(layer_idx+1)"}
714{"formula":"1/sqrt(layer+1)"}
720{"scale":"1/sqrt(layer+1)"}
726{"scale":"1/sqrt(layer+1)"}
728{"scale":"1/sqrt(layer+1)"}
733{"scale":"1/sqrt(i+1)"}
738{"formula":"1/sqrt(layer+1)"}
741
752{"scale":"1/sqrt(layer+1)"}
763{"scale":"1/sqrt(layer+1)"}
770
771{"scale":"1/sqrt(layer+1)"}
774{"scale":"1/sqrt(layer+1)"}
779
785{"enabled":1}
786{"formula":"1/sqrt(layer+1)"}
794{"formula":"1/sqrt(layer+1)"}
796{"scale":"1/sqrt(layer+1)"}
797{"formula":"1/sqrt(layer+1)"}
808{"formula":"1/sqrt(layer+1)"}
826
827{"scale":"1/sqrt(layer+1)"}
832{"enabled":true}
834
838{"scale":"1/sqrt(layer_idx+1)"}
841
855
872{"formula":"1/sqrt(layer+1)"}
925{"scale":"1/sqrt(layer+1)"}
991
1008
1077{"scale":"1/sqrt(layer+1)"}
1096{"description":"Output-LN / Peri-LN on shared blocks"}
1101{"scale":"1/sqrt(layer_idx + 1)"}
1231{"scale":"1/sqrt(layer+1)"}
1274{"ln_scale":1}
1319{"scale":"1/sqrt(layer+1)"}
1413
1467{"formula":"1/sqrt(layer+1)"}
1492
1514
1515
1520
1536
1539{"scale_rule":"1/sqrt(layer+1)"}
1541
1546
1555
1570
1586
1602
1616
1630{"scale":"1/sqrt(layer+1)"}
1639
1658{"scale":"1/sqrt(layer+1)"}
1661
1667{"formula":"1/sqrt(layer_idx+1)"}
1670
1676
1688
1689
1693
1696{"scale":"1/sqrt(layer+1)"}
1714
1715
1720
1737