← Back to Architecture
Gated Attention
ArchitectureUsed in
73 PRs
Best BPB
0.0281
Avg BPB
1.0038
Submissions
PR #344by aryanbhosale
1.1330PR #413by anantdgoel
1.4525PR #430by sahiee-dev
1.1428PR #474by joshuaswarren
1.1690PR #487by anantdgoel
1.1720PR #516by Asukabot0
1.1428PR #562by bigbag
1.1354PR #635by aryanbhosale
1.1330PR #638by Asukabot0
1.1164PR #670by abaybektursun
1.1171PR #715by Asukabot0
1.0337PR #727by Asukabot0
0.9674PR #733by stukenov
1.0278PR #745by stukenov
1.0222PR #754by aryanbhosale
1.1253PR #758by hypery11
1.0465PR #761by Asukabot0
0.9581PR #763by hypery11
0.9917PR #788by hypery11
0.9059PR #795by hypery11
0.8881PR #813by hypery11
0.6671PR #828by bigbag
0.9076PR #838by aryanbhosale
1.1215PR #850by callithyia
0.3212PR #864by aryanbhosale
0.2841PR #865by aryanbhosale
0.2841PR #871by greqone
0.8004PR #875by shalyhinpavel
1.0226PR #893by aryanbhosale
0.1310PR #909by sunnypatneedi
0.8609PR #921by TimPietrusky
0.0939PR #925by THUQiXuan
0.0281PR #940by antaloaalonso
0.9581PR #950by jzgdev
1.3178PR #952by FlashyFlash3011
1.1144PR #963by sunnypatneedi
0.8609PR #1001by ibarrajo
1.1188PR #1036by ivanontech
1.1974PR #1152by ericdatum
1.7942PR #1159by JDAppleseed
0.3693PR #1170by Christopher-Lee-McClendon
1.1199PR #1185by skoustav35
0.9641PR #1218by clarkkevRECORD
1.0978PR #1232by Christopher-Lee-McClendon
1.0929PR #1283by newjordan
1.1373PR #1287by dentity007
1.1048PR #1307by amrayach
1.1101PR #1311by htrung1105
1.1303PR #1410by izlley
1.1158PR #1452by bsisduck
0.3509PR #1454by bsisduck
0.3509PR #1490by wisebreadloaf
1.6110PR #1520by taka6745
1.0824PR #1536by dexhunter
1.0775PR #1537by pireylow
1.3971PR #1553by Abhishek8108
1.2097PR #1573by shivangbaveja
1.1464PR #1585by codemath3000
1.0639PR #1627by mike-ferguson
1.3246PR #1633by joshkmartinez
1.0585PR #1667by MarioPaerle
1.0714PR #1670by dexhunter
1.0597PR #1671by souro26
1.3827PR #1671by souro26
1.3827PR #1683by yunoshev
1.1280PR #1689by chris-colinsky
1.0822PR #1697by Buld1n
1.0812PR #1728by mikeapedia
1.0771PR #1734by yahya010
1.0108PR #1736by dexhunter
1.0655PR #1738by alertcat
1.0354PR #1751by Pravin-dev06
1.3565PR #1756by romeerp
1.0651Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 344 | — |
| 413 | {"bias_init":4} |
| 430 | {"layers":10} |
| 474 | — |
| 487 | {"added_params":37000} |
| 516 | — |
| 562 | — |
| 635 | — |
| 638 | — |
| 670 | — |
| 715 | — |
| 727 | — |
| 733 | — |
| 745 | — |
| 754 | — |
| 758 | — |
| 761 | — |
| 763 | — |
| 788 | — |
| 795 | — |
| 813 | — |
| 828 | — |
| 838 | — |
| 850 | {"bias":4} |
| 864 | — |
| 865 | — |
| 871 | — |
| 875 | {"layers":8,"final_attention_layer":1,"n_embd":384} |
| 893 | — |
| 909 | — |
| 921 | {"layers":11,"dim":512,"heads":8,"kv_heads":4} |
| 925 | {"experts":16,"hidden_size":512} |
| 940 | — |
| 950 | — |
| 952 | {"weight_init":0,"bias_init":4} |
| 963 | — |
| 1001 | — |
| 1036 | {"layers":12} |
| 1152 | {"init":0.1} |
| 1159 | {"enabled":0} |
| 1170 | — |
| 1185 | — |
| 1218 | — |
| 1232 | {"qk_gain_init":1.5} |
| 1283 | {"qk_gain_init":4} |
| 1287 | — |
| 1307 | {"layers":[2,4,6,8,10],"window_size":512} |
| 1311 | {"enabled":false} |
| 1410 | {"layers":[0,2,4,6,8,10]} |
| 1452 | {"layers":9} |
| 1454 | {"layers":9} |
| 1490 | {"layers":[1,3],"kv_heads":2} |
| 1520 | — |
| 1536 | — |
| 1537 | {"layer_start":7} |
| 1553 | {"qk_gain":5} |
| 1573 | — |
| 1585 | {"start_layer":8} |
| 1627 | — |
| 1633 | {"qk_gain_init":5.25} |
| 1667 | {"width":12,"layers":11} |
| 1670 | — |
| 1671 | — |
| 1671 | — |
| 1683 | — |
| 1689 | {"qk_gain":5.25} |
| 1697 | {"looped_band_layers":"3..5","recur_attn_gate":1,"recur_attn_gate_scale":0.5} |
| 1728 | — |
| 1734 | {"layers":10,"dimensions":544,"heads":8,"kv_share_stride":2} |
| 1736 | {"init_std":0.005} |
| 1738 | {"qk_gain":5.25} |
| 1751 | — |
| 1756 | — |