← Back to Architecture

RoPE

Architecture
Used in
150 PRs
Best BPB
0.0180
Avg BPB
1.1576

Submissions

PR #59by notapplica
1.2160
PR #61by saml212
1.2154
PR #69by TevBenji
1.1708
PR #85by hydeh3r3
1.2156
PR #86by aruniyerRECORD
1.1502
PR #120by andrewgcodes
0.9588
PR #126by Athenox14
1.7510
PR #139by ksang123
1.2029
PR #156by dexhunter
1.1602
PR #160by ChaseWNorton
1.1623
PR #163by Focus2321
1.2091
PR #186by mahsumaktas
1.1565
PR #198by jfprinczRECORD
1.1318
PR #201by machdragon
1.1551
PR #206by dexhunter
1.1507
PR #215by JayCheng113
1.1548
PR #216by alons23
0.8100
PR #218by bopmite
1.1248
PR #223by 0xjaishy
1.1326
PR #231by lenguyen1807
1.2036
PR #243by kvmukilan
1.1704
PR #249by kvmukilan
1.1704
PR #254by timowhite88
1.1303
PR #265by unnir
1.1307
PR #281by charmquark1984
1.1381
PR #287by jfprinczRECORD
1.1271
PR #290by ibarrajo
1.1354
PR #298by MrINVISO
1.2271
PR #305by Naazimsnh02
1.1672
PR #318by sseanliu
1.1284
PR #324by crony-io
1.1702
PR #325by Aum08Desai
1.1462
PR #326by crony-io
1.2890
PR #330by bopmite
1.1609
PR #333by mahsumaktas
1.1565
PR #367by ksang123
1.1770
PR #369by signalrush
1.1328
PR #379by dannywillowliu-uchi
1.1257
PR #385by dentity007
1.1488
PR #393by CrimsonSithria
1.2417
PR #394by greqone
1.1247
PR #408by markste-in
1.4784
PR #422by albertorkive
1.1396
PR #436by CrimsonSithria
1.2392
PR #442by sjp611
1.1027
PR #467by ADIITJ
1.1428
PR #512by MatoTeziTanka
0.9512
PR #517by lukacf
0.9789
PR #548by LoquiAuris
1.0865
PR #568by MatoTeziTanka
0.7853
PR #583by suchihype
1.1489
PR #589by RoyiRa
1.1178
PR #628by Christopher-Lee-McClendon
1.0983
PR #633by MatoTeziTanka
1.1526
PR #649by pall23-mech
1.2073
PR #659by deanbrr
1.0920
PR #664by tsbiosky
1.2982
PR #674by newjordan
1.0461
PR #678by SPThole
1.3525
PR #681by Alfaxad
1.4775
PR #684by DeepReinforce
1.0574
PR #702by lukacf
1.0244
PR #706by newjordan
1.0461
PR #714by Upsalla
1.1187
PR #730by janwww
1.1570
PR #738by gowtham0992
1.0970
PR #759by markste-in
1.3092
PR #767by RichiiiTV
0.9209
PR #771by sunnypatneedi
1.0705
PR #773by siddhantparadox
1.1532
PR #777by Robby955
0.9623
PR #785by SirSaltySalmon
1.5364
PR #797by armantsaturian
0.8960
PR #852by Prush69
1.1189
PR #894by albertorkive
1.1821
PR #905by anthony-maio
1.8587
PR #914by mkenney2
1.1873
PR #920by CiprianFlorin-Ifrim
1.1539
PR #924by THUQiXuan
0.0280
PR #925by THUQiXuan
0.0281
PR #979by 0xadvait
1.1387
PR #992by TimS-ml
1.4054
PR #994by singhaikshitijjain
1.4315
PR #999by aamodbhatt
1.1179
PR #1002by SoHarshh
1.1650
PR #1009by SoHarshh
1.1574
PR #1019by abaybektursunRECORD
1.1147
PR #1026by danielxmed
1.0945
PR #1028by newjordan
0.8104
PR #1030by sofiabod
0.1130
PR #1055by sanyalsunny111
0.9693
PR #1056by sofiabod
0.0180
PR #1068by LappyG
1.1510
PR #1092by teddyoweh
1.1219
PR #1097by danielxmed
1.3355
PR #1100by agalimova
1.1465
PR #1106by agalimova
1.1465
PR #1107by mradassaad
1.5633
PR #1120by newjordan
1.1099
PR #1130by Gusanidas
1.1140
PR #1139by ivanontech
1.1801
PR #1140by newjordan
1.1874
PR #1152by ericdatum
1.7942
PR #1154by LucasErcolano
1.7757
PR #1169by Bortlesboat
1.1126
PR #1212by Gusanidas
1.1108
PR #1227by himanshudongre
1.4841
PR #1241by aiejvn
0.9901
PR #1245by mkenney2
1.1470
PR #1254by Elarwei001
1.1070
PR #1263by xexyz
0.9354
PR #1273by DushyantChetiwal
1.2196
PR #1280by aamodbhatt
1.1156
PR #1283by newjordan
1.1373
PR #1293by 5en5e1
1.2409
PR #1302by vlivashkin
1.1078
PR #1308by newjordan
1.1364
PR #1312by adi-suresh01
1.3299
PR #1322by newjordan
1.0854
PR #1325by monisha-max
1.3868
PR #1337by sergimichi
1.2079
PR #1349by LocalX991
1.3693
PR #1364by stukenov
1.1025
PR #1371by aarjunsrinivasan
1.4709
PR #1388by CiprianFlorin-Ifrim
1.5390
PR #1392by Its-Just-Crump
1.1020
PR #1395by dttdrv
1.0924
PR #1396by erichroepke
1.1067
PR #1400by tmancino
1.1035
PR #1403by Rhoahndur
1.3485
PR #1407by OnlyJundong
1.0960
PR #1411by Blakethefn
1.5568
PR #1424by OnlyJundong
1.0858
PR #1434by ranausmanai
1.5207
PR #1449by codeprakhar25
1.3680
PR #1458by newjordan
1.1057
PR #1479by andrewbaggio1
1.1450
PR #1486by AlirezaAlampour
1.6656
PR #1489by joshkmartinez
1.0736
PR #1494by G3sparky
1.1220
PR #1501by SPThole
1.1159
PR #1502by SPThole
1.1147
PR #1509by Lumi-node
1.1962
PR #1570by yufang67
1.0970
PR #1581by aiejvn
1.2321
PR #1607by inin-zou
1.4765
PR #1626by dexhunter
1.0719
PR #1699by lsb
1.4831
PR #1709by Bananakin1
1.1470
PR #1732by Victory963
1.0785

Hyperparameters Across PRs

pr_numberparameters
59{"train_length":1024,"eval_length":2048}
61
69
85{"rope_base":50000}
86
120{"base":200000}
126
139{"base":200000}
156{"q_gain_init":1.5}
160
163{"base":50000}
186{"base":50000}
198
201
206{"base":50000}
215{"base":10000}
216
218{"sequence_length":2048}
223{"base":50000}
231
243{"base":50000}
249{"base":50000}
254{"base":50000}
265{"train_seq_len":1024,"auto_scales_at":2048}
281{"base":10000}
287
290{"base":50000}
298{"base":500000}
305{"base":10000}
318{"train_seq_len":1024,"cache_tokens":8192,"effective_context":"50K+"}
324{"context_length":4096}
325{"dimensions":16}
326{"eval_length":4096}
330{"sequence_length":2048}
333{"base":50000}
367{"base":200000}
369{"train_seq_len":1024}
379{"dimensions":"16/64"}
385
393{"base":50000}
394{"dimensions":16}
408{"rope_base":100000}
422
436{"base":50000}
442{"dimensions":16,"base_dimensions":64}
467
512{"base":50000}
517{"dimensions":"16/64"}
548{"persistent":false}
568{"base":50000}
583{"base":50000,"partial":false}
589{"dimensions":16,"total_dimensions":64}
628{"partial_dims":"16/64","train_seq_length":2048}
633{"base":50000}
649{"dimensions":16}
659{"dimensions":16,"total_dimensions":64}
664
674{"dims":24}
678{"rope_dims":64,"rope_base":10000}
681{"dimensions":"16/64"}
684{"dimensions":null}
702{"dimensions":"16/64"}
706{"dimensions":24}
714{"train_seq_len":2048}
730{"max_len":2048,"rope_base":5000}
738{"dimensions":16,"base_dimensions":64}
759{"base":100000}
767{"dimensions":24}
771{"dimensions":16,"total_dimensions":64}
773{"rope_dims":16}
777{"dims":"16/64"}
785{"dimensions":16}
797{"dimensions":"16/64"}
852
894
905
914
920{"type":"yarn","max_len":2048}
924{"dimensions":"16/64"}
925{"dimensions":"16/64"}
979
992
994
999{"dimensions":16}
1002{"dimensions":16,"total_dimensions":64}
1009{"dimensions":16}
1019{"dimensions":16,"total_dimensions":64}
1026{"dimensions":16,"total_dimensions":64}
1028{"dimensions":16}
1030{"dimensions":16}
1055{"base":50000}
1056{"dimensions":16}
1068{"rope_base":10000}
1092{"dimensions":16}
1097
1100
1106
1107
1120{"dimensions":16}
1130{"dimensions":"16/64"}
1139
1140{"value":16}
1152
1154
1169{"partial_dim":16}
1212
1227{"dimensions":16}
1241
1245
1254
1263
1273{"base":10000}
1280{"dimensions":16,"base_dimensions":64}
1283{"scales":[9,1,1]}
1293
1302{"dimensions":"16/64"}
1308{"scales":[9,1,1]}
1312
1322{"dimensions":16,"base":10000}
1325{"max_len":2048}
1337
1349{"dimensions":16}
1364{"dimensions":16,"total_dimensions":64}
1371
1388{"base":5000,"max_len":2048}
1392{"dimensions":16,"total_dimensions":64}
1395{"dimensions":"16/64"}
1396{"dimensions":16}
1400{"dimensions":16}
1403
1407{"dimensions":16,"total_dimensions":64}
1411{"bases":[1000,10000,100000,1000000]}
1424{"dimensions":16,"total_dimensions":64}
1434{"base":5000}
1449
1458{"dimensions":16}
1479{"dimensions":16}
1486{"base":10000}
1489{"base":10000,"train_seq":2048}
1494{"dimensions":16,"total_dimensions":64}
1501{"dimensions":16,"of_total":64}
1502{"dimensions":"16/64"}
1509{"epsilon":0.1}
1570{"dimensions":32}
1581
1607{"base":10000}
1626{"dimensions":16,"base_dimensions":64}
1699{"base":10000}
1709{"dimensions":16,"base":10000}
1732{"rotary_dims":16,"total_dims":64}