← Back to Regularization

weight pruning

Regularization
Used in
2 PRs
Best BPB
1.1326
Avg BPB
1.1401

Hyperparameters Across PRs

pr_numberparameters
637{"amount":"10%","scope":"non-embedding linear weights","timing":"post-SWA, pre-quantization"}
861{"sparsity":0.15,"description":"Prune smallest weights before quantization for better compression."}