← Back to Optimizer

L-BFGS

Optimizer
Used in
2 PRs
Best BPB
0.2282
Avg BPB
0.6164

Hyperparameters Across PRs

pr_numberweight_decaymomentumother_params
1350{"max_iter":25,"history":20,"line_search":"strong_wolfe","space":"logit","warm_start":true,"delta_clamp":5,"focal_loss_last_tokens":128,"causal":true}
1507{"history_size":10,"line_search":"strong Wolfe","steps":6}