← Back to Weight Averaging

EMA + Tight SWA

Weight Averaging
Used in
28 PRs
Best BPB
0.0280
Avg BPB
0.9975

Hyperparameters Across PRs

pr_numberparameters
625{"EMA_decay":0.997,"SWA_every":50}
634{"EMA_decay":0.997,"SWA_frequency_steps":50,"SWA_scale_threshold":0.2}
653{"ema_decay":0.997,"swa_every":50}
657{"ema_decay":0.997,"swa_scale_max":0.2}
794{"ema_decay":0.997,"swa_every":50}
872{"ema_decay":0.997,"swa_interval":50}
884{"decay":0.997,"swa_interval_steps":50}
885{"ema_decay":0.997}
889{"decay":0.997}
890{"ema_decay":0.997}
924{"ema_decay":0.997,"swa_interval":50}
925{"ema_decay":0.997,"swa_every":50}
952{"ema_decay":0.997,"swa_every":50}
1026{"ema_decay":0.997,"swa_every":50}
1084{"ema_decay":0.997,"swa_every":50}
1128{"decay":0.997,"swa_every":50}
1202{"ema_decay":0.997,"swa_every":50}
1263{"decay":0.997,"swa_every":50,"swa_start_step":4600}
1269{"ema_decay":0.997,"swa_every":50}
1276{"ema_decay":0.997,"swa_every":50}
1280{"ema_decay":0.997,"swa_every":50}
1284{"ema_decay":0.997,"swa_interval":50}
1303{"decay":0.997}
1392{"ema_decay":0.997,"swa_every":50}
1407{"ema_decay":0.997,"swa_every":50}
1424{"ema_decay":0.997,"swa_every":50}
1446{"decay":0.997,"swa_every":50}
1696{"ema_decay":0.997,"swa_interval":50}