← Back to Regularization
LN scale
RegularizationUsed in
127 PRs
Best BPB
0.0280
Avg BPB
1.0827
Submissions
PR #332by saml212
1.1320PR #334by nathon-lee
1.2207PR #374by unnirRECORD
1.1246PR #397by translatingthename
1.1364PR #462by JoeProAI
1.0672PR #477by harsha-gouru
1.1522PR #816by jimliu741523
1.1194PR #876by Bortlesboat
0.5863PR #884by BhatiaUday
1.1448PR #885by lolrazh
0.9958PR #887by anthony-maio
0.9642PR #889by anthony-maio
0.9642PR #891by robbiebusinessacc
1.1428PR #892by robbiebusinessacc
1.1428PR #909by sunnypatneedi
0.8609PR #912by Bortlesboat
0.3461PR #915by anthony-maio
0.9642PR #922by greqone
0.0972PR #924by THUQiXuan
0.0280PR #926by NandhuRajRK
0.8705PR #932by anthony-maio
1.1580PR #941by aptsalt
1.3620PR #952by FlashyFlash3011
1.1144PR #953by dexhunter
1.0722PR #964by vivekvar-dl
1.3900PR #965by Adam-Jacuch
1.1184PR #967by dexhunter
1.0450PR #975by Abhishek8108
1.1216PR #999by aamodbhatt
1.1179PR #1002by SoHarshh
1.1650PR #1004by ibarrajo
1.1182PR #1005by OnlyJundong
1.0853PR #1009by SoHarshh
1.1574PR #1013by himanshudongre
1.1682PR #1016by ADIITJ
1.1269PR #1019by abaybektursunRECORD
1.1147PR #1026by danielxmed
1.0945PR #1032by wfproc
1.3631PR #1033by Naazimsnh02
0.4311PR #1039by yufengli-oai
1.1184PR #1043by okezue
1.1261PR #1051by tejas-goyal
1.2826PR #1057by Programmerryoki
1.2201PR #1060by dexhunter
1.1123PR #1069by manfromnowhere143
1.1190PR #1070by manfromnowhere143
1.1190PR #1072by vimeto
1.1170PR #1081by michaelwinczuk
1.1220PR #1084by AnubhavBharadwaaj
1.1185PR #1087by Dhenenjay
1.1407PR #1092by teddyoweh
1.1219PR #1099by Bortlesboat
1.1133PR #1113by gowtham0992
1.3705PR #1122by icryo
1.1146PR #1123by sisegod
1.1986PR #1125by jainpranjal97
1.1946PR #1126by AnirudhRahul
1.1091PR #1128by AnubhavBharadwaaj
1.1154PR #1129by EthanYangTW
1.1174PR #1130by Gusanidas
1.1140PR #1148by aamodbhatt
1.1179PR #1150by sahiee-dev
1.1151PR #1159by JDAppleseed
0.3693PR #1166by Christopher-Lee-McClendon
1.1347PR #1169by Bortlesboat
1.1126PR #1170by Christopher-Lee-McClendon
1.1199PR #1171by EthanYangTW
1.1145PR #1179by dexhunter
1.1105PR #1184by icryo
0.9485PR #1185by skoustav35
0.9641PR #1202by VirajDeshwal
1.1412PR #1209by andrewbaggio1
1.1064PR #1216by SoHarshh
1.1574PR #1227by himanshudongre
1.4841PR #1228by meinlebenswerk
1.1527PR #1230by nestamidavaine
1.1163PR #1232by Christopher-Lee-McClendon
1.0929PR #1236by ibarrajo
1.1179PR #1240by andrewbaggio1
1.1064PR #1242by Campbellb
1.0903PR #1244by monkeyKingProgrammer
1.1443PR #1247by fahmitech
1.2208PR #1269by Jtss-ux
1.1194PR #1276by BiggerDABOSS
1.1100PR #1280by aamodbhatt
1.1156PR #1284by tyrel-beede
1.1207PR #1289by MatoTeziTanka
1.0819PR #1290by aryanbhosale
1.1104PR #1296by aryanbhosale
1.0926PR #1298by Omrigotlieb
1.1043PR #1302by vlivashkin
1.1078PR #1303by anthony-maio
0.9462PR #1311by htrung1105
1.1303PR #1321by anthony-maio
0.7406PR #1324by yahya010
0.8275PR #1349by LocalX991
1.3693PR #1361by jorge-asenjo
1.1220PR #1364by stukenov
1.1025PR #1376by stukenov
0.7094PR #1378by Rajat123456789
1.1711PR #1383by nirmathur
1.3151PR #1386by Buld1n
1.1452PR #1389by Rome-1
1.7270PR #1392by Its-Just-Crump
1.1020PR #1396by erichroepke
1.1067PR #1405by anthony-maio
1.0856PR #1406by aamodbhatt
1.0887PR #1407by OnlyJundong
1.0960PR #1408by aamodbhatt
1.0800PR #1414by Abhishek8108
0.7093PR #1420by abaybektursun
1.0801PR #1424by OnlyJundong
1.0858PR #1446by LauraGomezjurado
1.0960PR #1456by sisegod
1.1465PR #1457by DilpreetBansi
1.1454PR #1473by AVINASH0052
1.1156PR #1517by RulinShao
1.0632PR #1528by xiehuanyi
1.1104PR #1538by davie2009kh
1.1180PR #1548by dljr-github
1.3220PR #1549by dljr-github
1.3220PR #1550by translatingthename
1.0587PR #1573by shivangbaveja
1.1464PR #1619by AVINASH0052
1.1156PR #1621by mrbese
1.1531PR #1666by mrbese
1.1531PR #1675by jayzuccarelli
1.1451Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 332 | {"scale_rule":"1/sqrt(layer_idx+1)"} |
| 334 | {"formula":"RMSNorm damped by 1/sqrt(layer+1)"} |
| 374 | {"scale_factor":"1/sqrt(layer_idx+1)"} |
| 397 | — |
| 462 | {"scale":"1/sqrt(layer_idx+1)"} |
| 477 | {"scale":"1/sqrt(layer+1)"} |
| 816 | {"scale_rule":"1/sqrt(layer+1)"} |
| 876 | — |
| 884 | {"schedule":"1/sqrt(layer+1)"} |
| 885 | — |
| 887 | — |
| 889 | — |
| 891 | {"scale":"1/sqrt(layer)"} |
| 892 | {"scale":"1/sqrt(layer)"} |
| 909 | {"formula":"1/sqrt(layer+1)"} |
| 912 | — |
| 915 | — |
| 922 | — |
| 924 | {"scale":"1/sqrt(layer+1)"} |
| 926 | — |
| 932 | — |
| 941 | {"scale":"1/sqrt(layer+1)"} |
| 952 | {"formula":"1/sqrt(layer+1)"} |
| 953 | — |
| 964 | — |
| 965 | {"ln_scale":1} |
| 967 | — |
| 975 | — |
| 999 | {"scale":"1/sqrt(layer+1)"} |
| 1002 | {"formula":"1/sqrt(layer+1)"} |
| 1004 | — |
| 1005 | {"scale":"1/sqrt(layer+1)"} |
| 1009 | {"scale":"1/sqrt(layer+1)"} |
| 1013 | {"ln_scale_factor":true} |
| 1016 | {"scale":"1/sqrt(L+1)"} |
| 1019 | {"scale":"1/sqrt(layer+1)"} |
| 1026 | {"scale":"1/sqrt(layer+1)"} |
| 1032 | {"formula":"1/sqrt(L+1)"} |
| 1033 | {"scale":"1/sqrt(layer+1)"} |
| 1039 | {"enabled":true} |
| 1043 | {"formula":"1/sqrt(layer+1)"} |
| 1051 | — |
| 1057 | {"scale":"1/sqrt(layer+1)"} |
| 1060 | — |
| 1069 | — |
| 1070 | {"scale":"1/sqrt(layer+1)"} |
| 1072 | {"formula":"1/sqrt(layer+1)"} |
| 1081 | — |
| 1084 | {"enabled":true} |
| 1087 | {"scale":"1/sqrt(l+1)"} |
| 1092 | {"enabled":true} |
| 1099 | — |
| 1113 | — |
| 1122 | {"scale":"1/sqrt(l+1)"} |
| 1123 | {"formula":"1/sqrt(layer+1)"} |
| 1125 | {"scale":"1/sqrt(layer+1)"} |
| 1126 | {"scale":"1/sqrt(layer_idx + 1)"} |
| 1128 | — |
| 1129 | — |
| 1130 | {"scale":"1/sqrt(layer+1)"} |
| 1148 | {"formula":"1/sqrt(layer+1)"} |
| 1150 | — |
| 1159 | {"enabled":1} |
| 1166 | — |
| 1169 | — |
| 1170 | {"value":1} |
| 1171 | {"scale":"1/sqrt(layer+1)"} |
| 1179 | {"scale":"1/sqrt(layer+1)"} |
| 1184 | {"formula":"1/sqrt(l+1)"} |
| 1185 | — |
| 1202 | {"scale":"1/sqrt(layer+1)"} |
| 1209 | — |
| 1216 | {"formula":"1/sqrt(layer+1)"} |
| 1227 | — |
| 1228 | — |
| 1230 | {"scale":"1/sqrt(layer+1)"} |
| 1232 | {"formula":"1/sqrt(layer+1)"} |
| 1236 | — |
| 1240 | — |
| 1242 | {"enabled":true} |
| 1244 | — |
| 1247 | {"scale":"1/sqrt(layer_idx+1)"} |
| 1269 | {"scale":"1/sqrt(layer+1)"} |
| 1276 | {"scale":"1/sqrt(layer+1)"} |
| 1280 | {"rule":"1/sqrt(layer+1)"} |
| 1284 | {"scale":"1/sqrt(layer+1)"} |
| 1289 | — |
| 1290 | {"scale":"1/sqrt(layer+1)"} |
| 1296 | — |
| 1298 | {"scale":"1/sqrt(layer+1)"} |
| 1302 | {"scale":"1/sqrt(layer+1)"} |
| 1303 | — |
| 1311 | {"formula":"1/sqrt(layer_idx+1)"} |
| 1321 | — |
| 1324 | — |
| 1349 | — |
| 1361 | {"rule":"1/sqrt(layer_idx+1)"} |
| 1364 | {"rule":"1/sqrt(layer+1)"} |
| 1376 | — |
| 1378 | {"scale":"1/sqrt(layer+1)"} |
| 1383 | — |
| 1386 | — |
| 1389 | {"scale":"1/sqrt(layer+1)"} |
| 1392 | {"scale":"1/sqrt(layer+1)"} |
| 1396 | {"enabled":true} |
| 1405 | — |
| 1406 | {"scale":"1/sqrt(layer+1)"} |
| 1407 | {"scale":"1/sqrt(layer+1)"} |
| 1408 | {"scale":"1/sqrt(layer+1)"} |
| 1414 | — |
| 1420 | {"scale":"1/sqrt(layer+1)"} |
| 1424 | {"scale":"1/sqrt(layer+1)"} |
| 1446 | {"scale":"1/sqrt(layer+1)"} |
| 1456 | {"scale_rule":"1/sqrt(layer+1)"} |
| 1457 | {"scale":"1/sqrt(layer+1)"} |
| 1473 | {"scale":"1/sqrt(L+1)"} |
| 1517 | {"formula":"1/sqrt(layer+1)"} |
| 1528 | — |
| 1538 | {"formula":"1/sqrt(layer+1)"} |
| 1548 | {"scale":"1/sqrt(layer_idx+1)"} |
| 1549 | {"scale":"1/sqrt(layer+1)"} |
| 1550 | {"scale":"1/sqrt(layer+1)"} |
| 1573 | {"formula":"1/sqrt(layer+1)"} |
| 1619 | {"scale":"1/sqrt(L+1)"} |
| 1621 | {"enabled":true} |
| 1666 | {"enabled":true} |
| 1675 | {"attn_scale_init":"1/sqrt(layer_idx+1)","mlp_scale_init":"1/sqrt(layer_idx+1)"} |