Tolerate QN linesearch failures when it's harmless #3791

achirkin · 2021-04-23T16:14:48Z

In some cases, the linesearch subroutine of the Quasi-Newton solver (logistic regression) may fail in a way that is tolerable for the solver. Currently, any linesearch failure makes the outer algorithm stop, and most of the time the found solution seems to be acceptable to a good precision. This proposal suggests to ignore some non-critical failures ( LS_MAX_ITERS_REACHED and LS_INVALID_STEP_MIN) and stop peacefully if the convergence check triggers or if the function change is too small.

The PR mostly changes the logging behavior, with one small exception. The last condition in update_and_check won't trigger if the linesearch error is not critical and if the target function goes down (i.e. the solver still advances). Previously, the solver would stop independently of the target change.

achirkin · 2021-04-26T14:01:20Z

Note, this PR makes more sense with #3777 , which sets a reasonable default for param.delta. And it ought to be the last PR to close #3645 (I'll check and close it manually if that's the case).

cjnolet

I think the theory behind this approach sounds fine to me- though it might lead to a marginal decrease in perf, it seems like it should provide a better solution in these cases. I just have one little nitpick and a question.

cpp/src/glm/qn/qn_solvers.cuh

achirkin · 2021-04-27T14:09:39Z

NB: sometimes, linesearch fails with LS_MAX_ITERS_REACHED while increasing the loss by a small value. Normally, this indicates that the solver has converged, but it could also mean there is a bug in the solver somewhere. In either case we stop the solver, but I am not sure whether we should report that as info, warning, or error.

dantegd · 2021-05-14T20:39:54Z

@achirkin I was wondering what's the status of this PR? It seems to me that it has addressed @cjnolet's comment, if so is it waiting for a re-review?

cjnolet · 2021-05-14T20:42:19Z

@dantegd this PR was converted back to a draft after my review to I was under the impression @achirkin was still ironing some things out. @achirkin, please let me know if this is ready and I can re-review.

achirkin · 2021-05-15T06:09:09Z

Sorry, I should have changed the labels. The solution I propose here is not perfect (it would still report an error when the target seems to grow due to numerical error), and the problem itself is not really pressing (wrong errors in logs). Thus, after a short discussion with @tfeher, we've decided to postpone this until we get a better idea on how to address it.

achirkin · 2021-07-21T12:46:51Z

Update in the logic:

If the lineseach fails, report this as a warning rather than an error.
Don't report it at all, if convergence check succeeds.

tfeher

Thanks Artem for this PR! Please update the PR description to state under which circumstances the stopping condition could differ from the earlier version. Otherwise it looks good.

cpp/src/glm/qn/qn_solvers.cuh

Co-authored-by: Tamas Bela Feher <[email protected]>

codecov-commenter · 2021-07-22T14:45:24Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@5b36ced). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.08    #3791   +/-   ##
===============================================
  Coverage                ?   85.72%           
===============================================
  Files                   ?      230           
  Lines                   ?    18191           
  Branches                ?        0           
===============================================
  Hits                    ?    15595           
  Misses                  ?     2596           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.20% <0.00%> (?)`
non-dask	`78.16% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b36ced...c9aaab2. Read the comment docs.

tfeher

Thanks Artem for the update, the PR looks good to me.

cjnolet

LGTM. Thanks Artem!

cjnolet · 2021-07-22T21:15:42Z

@gpucibot merge

In some cases, the linesearch subroutine of the Quasi-Newton solver (logistic regression) may fail in a way that is tolerable for the solver. Currently, any linesearch failure makes the outer algorithm stop, and most of the time the found solution seems to be acceptable to a good precision. This proposal suggests to ignore some non-critical failures ( `LS_MAX_ITERS_REACHED` and `LS_INVALID_STEP_MIN`) and stop peacefully if the convergence check triggers or if the function change is too small. The PR mostly changes the logging behavior, with one small exception. The last condition in `update_and_check` won't trigger if the linesearch error is not critical and if the target function goes down (i.e. the solver still advances). Previously, the solver would stop independently of the target change. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#3791

github-actions bot added the CUDA/C++ label Apr 23, 2021

achirkin added 2 - In Progress Currenty a work in progress bug Something isn't working non-breaking Non-breaking change labels Apr 23, 2021

achirkin requested a review from tfeher April 23, 2021 16:16

achirkin changed the title ~~[WIP] Tolerate QN linesearch failures when it's harmless~~ Tolerate QN linesearch failures when it's harmless Apr 26, 2021

achirkin marked this pull request as ready for review April 26, 2021 13:58

achirkin requested a review from a team as a code owner April 26, 2021 13:58

achirkin added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Apr 26, 2021

dantegd assigned tfeher Apr 26, 2021

cjnolet self-assigned this Apr 26, 2021

cjnolet requested changes Apr 26, 2021

View reviewed changes

cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved

cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved

Tolerate linesearch failures sometimes

01db379

achirkin marked this pull request as draft April 27, 2021 09:18

achirkin added 2 - In Progress Currenty a work in progress and removed 3 - Ready for Review Ready for review by team labels Apr 27, 2021

Fix a typo and print the LS return code

687eb35

achirkin force-pushed the bug-ext-qn-linesearch-failure branch from 580cf77 to 687eb35 Compare April 27, 2021 10:06

dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress labels May 14, 2021

cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels May 14, 2021

github-actions bot added conda conda issue Cython / Python Cython or Python issue gpuCI gpuCI issue labels Jul 21, 2021

achirkin changed the base branch from branch-21.06 to branch-21.08 July 21, 2021 07:46

Make linesearch complain only if not converged.

d2055c7

github-actions bot removed CMake gpuCI gpuCI issue conda conda issue Cython / Python Cython or Python issue labels Jul 21, 2021

Stop due to linesearch doing nothing.

63cca81

achirkin added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Jul 21, 2021

achirkin marked this pull request as ready for review July 21, 2021 12:43

achirkin requested a review from cjnolet July 21, 2021 12:45

tfeher requested changes Jul 22, 2021

View reviewed changes

cpp/src/glm/qn/qn_solvers.cuh Show resolved Hide resolved

cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved

tfeher reviewed Jul 22, 2021

View reviewed changes

cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved

achirkin and others added 3 commits July 22, 2021 11:15

Add more comments and bring back some trace messages

5ce7796

Update cpp/src/glm/qn/qn_solvers.cuh

18ced01

Co-authored-by: Tamas Bela Feher <[email protected]>

Make the debug messages less worrying for the users

c9aaab2

achirkin requested a review from tfeher July 22, 2021 10:48

tfeher approved these changes Jul 22, 2021

View reviewed changes

cjnolet approved these changes Jul 22, 2021

View reviewed changes

rapids-bot bot merged commit 0a6c5ba into rapidsai:branch-21.08 Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tolerate QN linesearch failures when it's harmless #3791

Tolerate QN linesearch failures when it's harmless #3791

achirkin commented Apr 23, 2021 •

edited

Loading

achirkin commented Apr 26, 2021 •

edited

Loading

cjnolet left a comment

achirkin commented Apr 27, 2021

dantegd commented May 14, 2021

cjnolet commented May 14, 2021

achirkin commented May 15, 2021

achirkin commented Jul 21, 2021

tfeher left a comment

codecov-commenter commented Jul 22, 2021

tfeher left a comment

cjnolet left a comment

cjnolet commented Jul 22, 2021

Tolerate QN linesearch failures when it's harmless #3791

Tolerate QN linesearch failures when it's harmless #3791

Conversation

achirkin commented Apr 23, 2021 • edited Loading

achirkin commented Apr 26, 2021 • edited Loading

cjnolet left a comment

Choose a reason for hiding this comment

achirkin commented Apr 27, 2021

dantegd commented May 14, 2021

cjnolet commented May 14, 2021

achirkin commented May 15, 2021

achirkin commented Jul 21, 2021

tfeher left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 22, 2021

Codecov Report

tfeher left a comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Jul 22, 2021

achirkin commented Apr 23, 2021 •

edited

Loading

achirkin commented Apr 26, 2021 •

edited

Loading