← Back to Regularization

LN scale

Regularization
Used in
127 PRs
Best BPB
0.0280
Avg BPB
1.0827

Submissions

PR #332by saml212
1.1320
PR #334by nathon-lee
1.2207
PR #374by unnirRECORD
1.1246
PR #397by translatingthename
1.1364
PR #462by JoeProAI
1.0672
PR #477by harsha-gouru
1.1522
PR #816by jimliu741523
1.1194
PR #876by Bortlesboat
0.5863
PR #884by BhatiaUday
1.1448
PR #885by lolrazh
0.9958
PR #887by anthony-maio
0.9642
PR #889by anthony-maio
0.9642
PR #891by robbiebusinessacc
1.1428
PR #892by robbiebusinessacc
1.1428
PR #909by sunnypatneedi
0.8609
PR #912by Bortlesboat
0.3461
PR #915by anthony-maio
0.9642
PR #922by greqone
0.0972
PR #924by THUQiXuan
0.0280
PR #926by NandhuRajRK
0.8705
PR #932by anthony-maio
1.1580
PR #941by aptsalt
1.3620
PR #952by FlashyFlash3011
1.1144
PR #953by dexhunter
1.0722
PR #964by vivekvar-dl
1.3900
PR #965by Adam-Jacuch
1.1184
PR #967by dexhunter
1.0450
PR #975by Abhishek8108
1.1216
PR #999by aamodbhatt
1.1179
PR #1002by SoHarshh
1.1650
PR #1004by ibarrajo
1.1182
PR #1005by OnlyJundong
1.0853
PR #1009by SoHarshh
1.1574
PR #1013by himanshudongre
1.1682
PR #1016by ADIITJ
1.1269
PR #1019by abaybektursunRECORD
1.1147
PR #1026by danielxmed
1.0945
PR #1032by wfproc
1.3631
PR #1033by Naazimsnh02
0.4311
PR #1039by yufengli-oai
1.1184
PR #1043by okezue
1.1261
PR #1051by tejas-goyal
1.2826
PR #1057by Programmerryoki
1.2201
PR #1060by dexhunter
1.1123
PR #1069by manfromnowhere143
1.1190
PR #1070by manfromnowhere143
1.1190
PR #1072by vimeto
1.1170
PR #1081by michaelwinczuk
1.1220
PR #1084by AnubhavBharadwaaj
1.1185
PR #1087by Dhenenjay
1.1407
PR #1092by teddyoweh
1.1219
PR #1099by Bortlesboat
1.1133
PR #1113by gowtham0992
1.3705
PR #1122by icryo
1.1146
PR #1123by sisegod
1.1986
PR #1125by jainpranjal97
1.1946
PR #1126by AnirudhRahul
1.1091
PR #1128by AnubhavBharadwaaj
1.1154
PR #1129by EthanYangTW
1.1174
PR #1130by Gusanidas
1.1140
PR #1148by aamodbhatt
1.1179
PR #1150by sahiee-dev
1.1151
PR #1159by JDAppleseed
0.3693
PR #1166by Christopher-Lee-McClendon
1.1347
PR #1169by Bortlesboat
1.1126
PR #1170by Christopher-Lee-McClendon
1.1199
PR #1171by EthanYangTW
1.1145
PR #1179by dexhunter
1.1105
PR #1184by icryo
0.9485
PR #1185by skoustav35
0.9641
PR #1202by VirajDeshwal
1.1412
PR #1209by andrewbaggio1
1.1064
PR #1216by SoHarshh
1.1574
PR #1227by himanshudongre
1.4841
PR #1228by meinlebenswerk
1.1527
PR #1230by nestamidavaine
1.1163
PR #1232by Christopher-Lee-McClendon
1.0929
PR #1236by ibarrajo
1.1179
PR #1240by andrewbaggio1
1.1064
PR #1242by Campbellb
1.0903
PR #1244by monkeyKingProgrammer
1.1443
PR #1247by fahmitech
1.2208
PR #1269by Jtss-ux
1.1194
PR #1276by BiggerDABOSS
1.1100
PR #1280by aamodbhatt
1.1156
PR #1284by tyrel-beede
1.1207
PR #1289by MatoTeziTanka
1.0819
PR #1290by aryanbhosale
1.1104
PR #1296by aryanbhosale
1.0926
PR #1298by Omrigotlieb
1.1043
PR #1302by vlivashkin
1.1078
PR #1303by anthony-maio
0.9462
PR #1311by htrung1105
1.1303
PR #1321by anthony-maio
0.7406
PR #1324by yahya010
0.8275
PR #1349by LocalX991
1.3693
PR #1361by jorge-asenjo
1.1220
PR #1364by stukenov
1.1025
PR #1376by stukenov
0.7094
PR #1378by Rajat123456789
1.1711
PR #1383by nirmathur
1.3151
PR #1386by Buld1n
1.1452
PR #1389by Rome-1
1.7270
PR #1392by Its-Just-Crump
1.1020
PR #1396by erichroepke
1.1067
PR #1405by anthony-maio
1.0856
PR #1406by aamodbhatt
1.0887
PR #1407by OnlyJundong
1.0960
PR #1408by aamodbhatt
1.0800
PR #1414by Abhishek8108
0.7093
PR #1420by abaybektursun
1.0801
PR #1424by OnlyJundong
1.0858
PR #1446by LauraGomezjurado
1.0960
PR #1456by sisegod
1.1465
PR #1457by DilpreetBansi
1.1454
PR #1473by AVINASH0052
1.1156
PR #1517by RulinShao
1.0632
PR #1528by xiehuanyi
1.1104
PR #1538by davie2009kh
1.1180
PR #1548by dljr-github
1.3220
PR #1549by dljr-github
1.3220
PR #1550by translatingthename
1.0587
PR #1573by shivangbaveja
1.1464
PR #1619by AVINASH0052
1.1156
PR #1621by mrbese
1.1531
PR #1666by mrbese
1.1531
PR #1675by jayzuccarelli
1.1451

Hyperparameters Across PRs

pr_numberparameters
332{"scale_rule":"1/sqrt(layer_idx+1)"}
334{"formula":"RMSNorm damped by 1/sqrt(layer+1)"}
374{"scale_factor":"1/sqrt(layer_idx+1)"}
397
462{"scale":"1/sqrt(layer_idx+1)"}
477{"scale":"1/sqrt(layer+1)"}
816{"scale_rule":"1/sqrt(layer+1)"}
876
884{"schedule":"1/sqrt(layer+1)"}
885
887
889
891{"scale":"1/sqrt(layer)"}
892{"scale":"1/sqrt(layer)"}
909{"formula":"1/sqrt(layer+1)"}
912
915
922
924{"scale":"1/sqrt(layer+1)"}
926
932
941{"scale":"1/sqrt(layer+1)"}
952{"formula":"1/sqrt(layer+1)"}
953
964
965{"ln_scale":1}
967
975
999{"scale":"1/sqrt(layer+1)"}
1002{"formula":"1/sqrt(layer+1)"}
1004
1005{"scale":"1/sqrt(layer+1)"}
1009{"scale":"1/sqrt(layer+1)"}
1013{"ln_scale_factor":true}
1016{"scale":"1/sqrt(L+1)"}
1019{"scale":"1/sqrt(layer+1)"}
1026{"scale":"1/sqrt(layer+1)"}
1032{"formula":"1/sqrt(L+1)"}
1033{"scale":"1/sqrt(layer+1)"}
1039{"enabled":true}
1043{"formula":"1/sqrt(layer+1)"}
1051
1057{"scale":"1/sqrt(layer+1)"}
1060
1069
1070{"scale":"1/sqrt(layer+1)"}
1072{"formula":"1/sqrt(layer+1)"}
1081
1084{"enabled":true}
1087{"scale":"1/sqrt(l+1)"}
1092{"enabled":true}
1099
1113
1122{"scale":"1/sqrt(l+1)"}
1123{"formula":"1/sqrt(layer+1)"}
1125{"scale":"1/sqrt(layer+1)"}
1126{"scale":"1/sqrt(layer_idx + 1)"}
1128
1129
1130{"scale":"1/sqrt(layer+1)"}
1148{"formula":"1/sqrt(layer+1)"}
1150
1159{"enabled":1}
1166
1169
1170{"value":1}
1171{"scale":"1/sqrt(layer+1)"}
1179{"scale":"1/sqrt(layer+1)"}
1184{"formula":"1/sqrt(l+1)"}
1185
1202{"scale":"1/sqrt(layer+1)"}
1209
1216{"formula":"1/sqrt(layer+1)"}
1227
1228
1230{"scale":"1/sqrt(layer+1)"}
1232{"formula":"1/sqrt(layer+1)"}
1236
1240
1242{"enabled":true}
1244
1247{"scale":"1/sqrt(layer_idx+1)"}
1269{"scale":"1/sqrt(layer+1)"}
1276{"scale":"1/sqrt(layer+1)"}
1280{"rule":"1/sqrt(layer+1)"}
1284{"scale":"1/sqrt(layer+1)"}
1289
1290{"scale":"1/sqrt(layer+1)"}
1296
1298{"scale":"1/sqrt(layer+1)"}
1302{"scale":"1/sqrt(layer+1)"}
1303
1311{"formula":"1/sqrt(layer_idx+1)"}
1321
1324
1349
1361{"rule":"1/sqrt(layer_idx+1)"}
1364{"rule":"1/sqrt(layer+1)"}
1376
1378{"scale":"1/sqrt(layer+1)"}
1383
1386
1389{"scale":"1/sqrt(layer+1)"}
1392{"scale":"1/sqrt(layer+1)"}
1396{"enabled":true}
1405
1406{"scale":"1/sqrt(layer+1)"}
1407{"scale":"1/sqrt(layer+1)"}
1408{"scale":"1/sqrt(layer+1)"}
1414
1420{"scale":"1/sqrt(layer+1)"}
1424{"scale":"1/sqrt(layer+1)"}
1446{"scale":"1/sqrt(layer+1)"}
1456{"scale_rule":"1/sqrt(layer+1)"}
1457{"scale":"1/sqrt(layer+1)"}
1473{"scale":"1/sqrt(L+1)"}
1517{"formula":"1/sqrt(layer+1)"}
1528
1538{"formula":"1/sqrt(layer+1)"}
1548{"scale":"1/sqrt(layer_idx+1)"}
1549{"scale":"1/sqrt(layer+1)"}
1550{"scale":"1/sqrt(layer+1)"}
1573{"formula":"1/sqrt(layer+1)"}
1619{"scale":"1/sqrt(L+1)"}
1621{"enabled":true}
1666{"enabled":true}
1675{"attn_scale_init":"1/sqrt(layer_idx+1)","mlp_scale_init":"1/sqrt(layer_idx+1)"}