← Back to Architecture
tied embeddings
ArchitectureUsed in
211 PRs
Best BPB
0.2952
Avg BPB
1.1806
Submissions
PR #30by JackYoung27
1.2663PR #34by ChenLiu-1996
1.2244PR #37by khasinski
1.2012PR #39by nanlliuRECORD
1.2139PR #42by chonchiog
1.2197PR #44by daniellawson9999
1.1111PR #45by kiankyars
1.2240PR #49by spokane-wayRECORD
1.2058PR #52by spokane-wayRECORD
1.2014PR #53by kshitizz36
1.1888PR #59by notapplica
1.2160PR #61by saml212
1.2154PR #63by yahya010RECORD
1.1598PR #65by aquariouseworkmanRECORD
1.1556PR #66by arjun-krishna1
1.1632PR #69by TevBenji
1.1708PR #70by jfprincz
1.1659PR #71by AntDX316
1.3509PR #74by takhir-iota
1.1884PR #75by takhir-iota
1.1768PR #79by Marvbuster
1.8698PR #85by hydeh3r3
1.2156PR #86by aruniyerRECORD
1.1502PR #88by seanward
1.1605PR #93by aamodbhatt
1.3693PR #95by MatoTeziTanka
1.1836PR #96by saml212
1.1764PR #103by MatthewHRockwell
1.5000PR #107by m0at
1.1648PR #108by kellyvv
1.4370PR #110by mr-ashish-panday
1.2244PR #113by JoeProAI
1.1870PR #114by saml212
1.1574PR #123by saikrishnarallabandi
1.1642PR #125by akshai0296
1.3797PR #126by Athenox14
1.7510PR #128by rsavitt
1.1594PR #131by Billy1900
1.2701PR #136by ibarrajo
1.2101PR #139by ksang123
1.2029PR #142by ankitmaloo
1.1925PR #143by Julz19
1.1779PR #145by mrdavtan
1.2052PR #146by swapp1990
1.2987PR #147by ankitmaloo
1.1631PR #150by yahya010
1.1478PR #151by mrdavtan
1.2045PR #152by timowhite88
1.1744PR #156by dexhunter
1.1602PR #160by ChaseWNorton
1.1623PR #163by Focus2321
1.2091PR #166by chinesepowered
1.1550PR #168by spokane-way
1.0217PR #169by beee003
1.1973PR #170by baudrillardsgh0st
1.1669PR #172by GMaN1911
1.1812PR #173by tamoghnokandar
1.1532PR #176by GLDRoger
1.1732PR #180by thwu1RECORD
1.1428PR #183by anantdgoel
1.2529PR #184by Idan3011
1.1855PR #190by newjordan
1.1725PR #191by chris-buckley
1.1598PR #192by baudrillardsgh0st
1.1502PR #193by KHUCHAN
1.2917PR #194by baudrillardsgh0st
1.1480PR #195by chasewebb
1.2355PR #200by khasinski
1.2012PR #204by Akasxh
1.2320PR #206by dexhunter
1.1507PR #209by JWLBOYCE
1.1624PR #211by dubthecat
1.1719PR #215by JayCheng113
1.1548PR #217by kshitizz36
1.1753PR #218by bopmite
1.1248PR #219by alertcat
1.1541PR #222by ansh-deriv
1.1601PR #223by 0xjaishy
1.1326PR #231by lenguyen1807
1.2036PR #232by kellyvv
1.4370PR #237by takoyakisoft
1.8389PR #240by riatzukiza
1.6660PR #247by riatzukiza
1.6114PR #248by riatzukiza
1.6231PR #251by kshitizz36
1.1596PR #252by greqone
1.1554PR #256by IvGolovach
1.1779PR #258by riatzukiza
1.6572PR #262by ibarrajo
1.0539PR #263by Dannybc123
1.5382PR #264by stukenov
1.1455PR #265by unnir
1.1307PR #266by User123331
1.3932PR #267by andrewgcodes
1.1374PR #273by dentity007
1.1575PR #274by haikosys
1.1403PR #275by ibarrajo
1.0539PR #278by nicolasdickenmann
1.0365PR #284by DanishjeetSingh
1.4106PR #294by sseanliu
1.1645PR #295by gowtham0992
1.1477PR #296by sseanliu
1.1645PR #301by lookin-zz
1.1807PR #306by xuafeng
1.1448PR #309by NewyorkDev
1.1914PR #327by Ananddna
1.1450PR #330by bopmite
1.1609PR #333by mahsumaktas
1.1565PR #343by joeynyc
1.2459PR #344by aryanbhosale
1.1330PR #349by Mapika
1.1399PR #351by sp00mm
1.1659PR #352by sp00mm
1.1659PR #362by mkenney2
1.1497PR #367by ksang123
1.1770PR #368by MatoTeziTanka
1.2037PR #369by signalrush
1.1328PR #373by JoeProAI
1.1634PR #374by unnirRECORD
1.1246PR #381by codestrongestx
1.1739PR #383by joelnishanth
1.1320PR #385by dentity007
1.1488PR #388by ElliotSlusky
1.1231PR #390by newjordan
1.1295PR #393by CrimsonSithria
1.2417PR #394by greqone
1.1247PR #401by newjordan
1.1243PR #406by dentity007
1.1287PR #407by itu-itis24-buyukhelvacigilm24
1.3208PR #414by signalrush
1.1233PR #418by yashverms
1.1715PR #420by leofeasby
1.1454PR #426by aniketio-ctrl
1.2026PR #432by jadechip
1.5295PR #434by parinzee
1.1370PR #436by CrimsonSithria
1.2392PR #437by jupram
1.2257PR #444by AymanMahfuz27
1.4536PR #446by sofiabod
1.1933PR #451by harborglowvintage-oss
1.1464PR #455by kasimte
1.1299PR #460by abhishekrajdhar
1.2928PR #465by LoquiAuris
1.1508PR #466by simonbissonnette
1.1354PR #467by ADIITJ
1.1428PR #470by leofeasby
1.1454PR #477by harsha-gouru
1.1522PR #478by gowtham0992
1.1268PR #481by mrdavtan
1.0970PR #485by harsha-gouru
1.1522PR #489by sofiabod
1.1327PR #492by Divyesh-Thirukonda
1.1591PR #495by SergiuDeveloper
1.2092PR #498by newjordan
1.1478PR #499by newjordan
1.1478PR #508by newjordan
1.1215PR #512by MatoTeziTanka
0.9512PR #516by Asukabot0
1.1428PR #518by sofiabod
1.0622PR #525by hypery11
1.1160PR #532by NotADevIAmaMeatPopsicle
1.0487PR #550by haimianbaobao007
1.1890PR #552by loveless2001
1.1634PR #554by chrisnkuno
1.4612PR #559by Parswanadh
1.5348PR #563by instax-dutta
1.1428PR #568by MatoTeziTanka
0.7853PR #573by Sarimsaljook
1.0523PR #587by newjordan
1.1208PR #588by andyluo22
1.4120PR #595by LoquiAuris
1.1100PR #605by bigbag
0.7227PR #612by Christopher-Lee-McClendon
1.1079PR #622by Upsalla
1.0941PR #634by raahilshah
1.1171PR #636by NewyorkDev
1.1234PR #649by pall23-mech
1.2073PR #662by simon-marcus
1.1208PR #664by tsbiosky
1.2982PR #665by harborglowvintage-oss
1.1464PR #666by chrislovescoding
1.1932PR #669by amabito
1.4942PR #678by SPThole
1.3525PR #695by 0xNoramiya
1.1360PR #705by seanward
1.2151PR #706by newjordan
1.0461PR #709by StolbaJ
1.1478PR #710by Dhruba531
1.1240PR #716by SHN2004
1.4239PR #724by hypery11
1.0717PR #727by Asukabot0
0.9674PR #730by janwww
1.1570PR #731by pentxayc
1.0400PR #754by aryanbhosale
1.1253PR #758by hypery11
1.0465PR #763by hypery11
0.9917PR #769by MatoTeziTanka
0.8508PR #779by deanbrr
0.6683PR #793by pall23-mech
1.2500PR #797by armantsaturian
0.8960PR #806by ibarrajo
0.6678PR #808by Naazimsnh02
0.6364PR #809by AayushBaniya2006
0.2952PR #810by Idan3011
0.9393PR #816by jimliu741523
1.1194PR #822by henrycashe26
1.2604PR #828by bigbag
0.9076PR #838by aryanbhosale
1.1215PR #841by someone114514
1.1157PR #849by dttdrv
1.1105PR #858by nickferrantelive
1.2135Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 30 | — |
| 34 | — |
| 37 | {"tie_embeddings":0} |
| 39 | — |
| 42 | {"tie_embeddings":1} |
| 44 | — |
| 45 | — |
| 49 | — |
| 52 | — |
| 53 | {"tie_embeddings":1} |
| 59 | — |
| 61 | — |
| 63 | — |
| 65 | — |
| 66 | — |
| 69 | — |
| 70 | — |
| 71 | — |
| 74 | {"tie_embeddings":1} |
| 75 | {"tie_embeddings":1} |
| 79 | — |
| 85 | — |
| 86 | {"vocab_size":1024} |
| 88 | {"fp16_export":1} |
| 93 | — |
| 95 | — |
| 96 | — |
| 103 | — |
| 107 | — |
| 108 | — |
| 110 | — |
| 113 | — |
| 114 | — |
| 123 | — |
| 125 | {"enabled":1} |
| 126 | — |
| 128 | {"tie_embeddings":1} |
| 131 | — |
| 136 | — |
| 139 | — |
| 142 | — |
| 143 | {"tie_embeddings":1} |
| 145 | {"layers":9,"dimensions":512} |
| 146 | — |
| 147 | — |
| 150 | — |
| 151 | — |
| 152 | — |
| 156 | — |
| 160 | {"tie_embeddings":1} |
| 163 | — |
| 166 | — |
| 168 | — |
| 169 | — |
| 170 | — |
| 172 | — |
| 173 | — |
| 176 | — |
| 180 | — |
| 183 | — |
| 184 | — |
| 190 | {"fp16":true} |
| 191 | — |
| 192 | — |
| 193 | — |
| 194 | — |
| 195 | — |
| 200 | — |
| 204 | — |
| 206 | — |
| 209 | {"layers":11,"vocab":1024,"dim":512,"heads":8,"kv":4,"mlp_hidden":1536} |
| 211 | — |
| 215 | — |
| 217 | — |
| 218 | — |
| 219 | {"vocab":1024} |
| 222 | {"fp16_passthrough":true} |
| 223 | — |
| 231 | — |
| 232 | — |
| 237 | — |
| 240 | — |
| 247 | — |
| 248 | — |
| 251 | — |
| 252 | {"layers":9,"width":512,"sp":1024} |
| 256 | — |
| 258 | — |
| 262 | — |
| 263 | {"layers":9,"dim":512,"heads":8,"kv_heads":4,"mlp_multiplier":2} |
| 264 | — |
| 265 | — |
| 266 | — |
| 267 | — |
| 273 | — |
| 274 | — |
| 275 | — |
| 278 | — |
| 284 | — |
| 294 | {"vocab_size":1024,"dimension":768} |
| 295 | — |
| 296 | — |
| 301 | — |
| 306 | — |
| 309 | — |
| 327 | — |
| 330 | — |
| 333 | — |
| 343 | — |
| 344 | — |
| 349 | — |
| 351 | — |
| 352 | — |
| 362 | — |
| 367 | — |
| 368 | — |
| 369 | — |
| 373 | — |
| 374 | — |
| 381 | {"layers":10,"model_dim":512,"num_heads":8,"num_kv_heads":4} |
| 383 | — |
| 385 | — |
| 388 | — |
| 390 | — |
| 393 | — |
| 394 | — |
| 401 | — |
| 406 | — |
| 407 | — |
| 414 | — |
| 418 | — |
| 420 | — |
| 426 | — |
| 432 | — |
| 434 | — |
| 436 | — |
| 437 | — |
| 444 | — |
| 446 | — |
| 451 | — |
| 455 | — |
| 460 | — |
| 465 | — |
| 466 | — |
| 467 | — |
| 470 | — |
| 477 | — |
| 478 | — |
| 481 | — |
| 485 | — |
| 489 | {"vocab_size":1024} |
| 492 | — |
| 495 | {"TIE_EMBEDDINGS":1} |
| 498 | — |
| 499 | — |
| 508 | — |
| 512 | — |
| 516 | — |
| 518 | {"vocab_size":1024} |
| 525 | — |
| 532 | — |
| 550 | — |
| 552 | — |
| 554 | — |
| 559 | {"precision":"FP16"} |
| 563 | — |
| 568 | — |
| 573 | — |
| 587 | — |
| 588 | {"TIE_EMBEDDINGS":0} |
| 595 | — |
| 605 | — |
| 612 | — |
| 622 | — |
| 634 | — |
| 636 | — |
| 649 | — |
| 662 | — |
| 664 | — |
| 665 | — |
| 666 | — |
| 669 | — |
| 678 | — |
| 695 | — |
| 705 | — |
| 706 | — |
| 709 | {"vocab_size":1024} |
| 710 | — |
| 716 | — |
| 724 | — |
| 727 | — |
| 730 | {"embed_dim":254,"vocab_size":8192} |
| 731 | — |
| 754 | — |
| 758 | — |
| 763 | — |
| 769 | — |
| 779 | — |
| 793 | — |
| 797 | {"softcap":30} |
| 806 | — |
| 808 | — |
| 809 | — |
| 810 | — |
| 816 | — |
| 822 | — |
| 828 | — |
| 838 | — |
| 841 | — |
| 849 | — |
| 858 | — |