Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

achirkin · 2021-04-19T11:28:48Z

Change the QN solver (logistic regression) stopping conditions to avoid early stops in some cases (#3645):

primary:
```
|| f' ||_inf <= fmag * param.epsilon
```
secondary:
```
|f - f_prev| <= fmag * param.delta
```

where fmag = max(|f|, param.epsilon).

Also change the default value of tol in QN solver (which sets param.delta) to be consistent (1e-4) with the logistic regression solver.

Background

The original primary stopping condition is inconsistent with the sklearn reference implementation and is often triggered too early:

|| f' ||_2 <= param.epsilon * max(1.0, || x ||_2)

Here are the sklearn conditions for reference:

primary:
```
|| grad f ||_inf <= gtol
```

secondary:

|f - f_prev| <= ftol * max(|f|, |f_prev|, 1.0)

where gtol is and exposed parameter like param.epsilon, and ftol = 2.2e-9 (hardcoded).
In addition, f in sklearn is scaled with the sample size (softmax or sigmoid over the dataset), so it's not exactly comparable to cuML version.

Currently, cuML checks the gradient w.r.t. the logistic regression weights x. As a result, the tolerance value goes up with the number of classes and features; the model stops too early and stays underfit. This may in part be a reason for #3645.
In this proposal I change the stopping condition to be closer to the sklearn version, but compromise the consistency with sklearn for better scaling (tolerance scales with the absolute values of the objective function). Without this scaling sklearn version seems to often run till the maximum iteration limit is reached.

GPUtester · 2021-04-19T11:28:49Z

Can one of the admins verify this patch?

GPUtester · 2021-04-19T11:28:49Z

Can one of the admins verify this patch?

achirkin · 2021-04-19T11:30:31Z

Perhaps, I'd like @tfeher have a look at this before.

achirkin · 2021-04-19T11:31:10Z

FYI, the are a few more differences between cuML and sklearn, which haven't been addressed here directly:

sklearn's multinomial loss is a sum of softmax values of the data rows, cuML's loss is an average
the default tolerance is 1e-3 (cuML) vs 1e-4 (sklearn)
L-BFGS default memory parameter is 5 (cuML) vs 10 (sklearn)
the default max_iter 1000 (cuML) vs 100 (sklearn)

achirkin · 2021-04-20T07:54:07Z

With our rather big default max_iter = 1000 and the new stopping conditions, the model may get stuck doing miniscule steps for hundreds of iterations. We've had params.delta to avoid that, but it's set to zero by default with no way to change it from the user site. It's may be worth changing in this PR too. Loss value/gradient:

The blue line is ours (visually not as bad as sklearn version, yet there is clearly a room for improvement :). The question is, whether there is an improvement in accuracy from doing iterations 200-600 and whether it's worth the time.

achirkin · 2021-04-20T10:03:02Z

This one with params.delta enabled:

This also changed the minimum correlation(cuML weights, sklearn weights) from 0.998 to 0.97, while the accuracy did not change at all.

achirkin · 2021-04-20T12:48:18Z

Removed the API-changing commits to add them in a separate PR.

tfeher

Thanks Artem for this PR! It looks good in general. The PR description should be updated by adding the secondary stopping condition from scipy. Let's discuss the details offline.

python/cuml/solvers/qn.pyx

cpp/src/glm/qn/simple_mat.cuh

cjnolet · 2021-04-21T18:35:10Z

@achirkin, thank you for working to help fix #3645 so quickly! This also help explain why I didn't see a whole lot of difference from increasing the max iterations. Overall, I think these changes sound good and I'm in the process of running your changes on the MRE.

cjnolet

Just some minor initial review feedback

python/cuml/solvers/qn.pyx

JohnZed · 2021-04-22T00:16:55Z

add to allowlist

python/cuml/solvers/qn.pyx

codecov-commenter · 2021-04-23T11:29:32Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.20@a0cec3e). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-0.20    #3766   +/-   ##
==============================================
  Coverage               ?   86.04%           
==============================================
  Files                  ?      225           
  Lines                  ?    17117           
  Branches               ?        0           
==============================================
  Hits                   ?    14729           
  Misses                 ?     2388           
  Partials               ?        0

Flag	Coverage Δ
dask	`49.28% <0.00%> (?)`
non-dask	`77.95% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0cec3e...bd96fda. Read the comment docs.

python/cuml/linear_model/logistic_regression.pyx

tfeher

Thanks Artem for the updates, the PR looks good go!

python/cuml/linear_model/logistic_regression.pyx

tfeher · 2021-04-23T17:58:52Z

@gpucibot merge

…sklearn closer (rapidsai#3766) Change the QN solver (logistic regression) stopping conditions to avoid early stops in some cases (rapidsai#3645): - primary: ``` || f' ||_inf <= fmag * param.epsilon ``` - secondary: ``` |f - f_prev| <= fmag * param.delta ``` where `fmag = max(|f|, param.epsilon)`. Also change the default value of `tol` in QN solver (which sets `param.delta`) to be consistent (`1e-4`) with the logistic regression solver. #### Background The original primary stopping condition is inconsistent with the sklearn reference implementation and is often triggered too early: ``` || f' ||_2 <= param.epsilon * max(1.0, || x ||_2) ``` Here are the sklearn conditions for reference: - primary: ``` || grad f ||_inf <= gtol ``` - secondary: ``` |f - f_prev| <= ftol * max(|f|, |f_prev|, 1.0) ``` where `gtol` is and exposed parameter like `param.epsilon`, and `ftol = 2.2e-9` (hardcoded). In addition, `f` in sklearn is scaled with the sample size (softmax or sigmoid over the dataset), so it's not exactly comparable to cuML version. Currently, cuML checks the gradient w.r.t. the logistic regression weights `x`. As a result, the tolerance value goes up with the number of classes and features; the model stops too early and stays underfit. This may in part be a reason for rapidsai#3645. In this proposal I change the stopping condition to be closer to the sklearn version, but compromise the consistency with sklearn for better scaling (tolerance scales with the absolute values of the objective function). Without this scaling sklearn version seems to often run till the maximum iteration limit is reached. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#3766

Change the stopping condition to L-inf scaled with |fx|

99dd8f3

achirkin requested a review from a team as a code owner April 19, 2021 11:28

github-actions bot added the CUDA/C++ label Apr 19, 2021

raydouglass added CUDA / C++ CUDA issue and removed CUDA/C++ labels Apr 19, 2021

tfeher self-assigned this Apr 19, 2021

JohnZed assigned cjnolet Apr 19, 2021

achirkin changed the title ~~Change the stopping condition to L-inf scaled with |fx|~~ [WIP] Change the stopping condition to L-inf scaled with |fx| Apr 20, 2021

achirkin marked this pull request as draft April 20, 2021 07:55

github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Apr 20, 2021

achirkin changed the title ~~[WIP] Change the stopping condition to L-inf scaled with |fx|~~ [WIP] Improve QN solver stopping conditions (logistic regression) Apr 20, 2021

tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 20, 2021

achirkin added 2 commits April 20, 2021 14:36

Change the default tolerance to match sklearn and linear_regression.pyx

3dca6ca

Modify the secondary stop condition to have the same logic.

69eae38

achirkin force-pushed the fix-qn-stopping-condition branch from d6702eb to 69eae38 Compare April 20, 2021 12:39

achirkin changed the title ~~[WIP] Improve QN solver stopping conditions (logistic regression)~~ Improve QN solver stopping conditions (logistic regression) to match sklearn closer Apr 20, 2021

achirkin marked this pull request as ready for review April 20, 2021 12:50

achirkin requested a review from a team as a code owner April 20, 2021 12:50

achirkin mentioned this pull request Apr 21, 2021

Better starting coefficients for QN solver (logistic regression) #3774

Merged

tfeher requested changes Apr 21, 2021

View reviewed changes

python/cuml/solvers/qn.pyx Show resolved Hide resolved

cpp/src/glm/qn/simple_mat.cuh Show resolved Hide resolved

Fix a docstring.

37e39a7

achirkin mentioned this pull request Apr 21, 2021

Expose the secondary stopping condition for QN solver #3777

Merged

Fix the copyright year

fb3dea0

achirkin requested a review from tfeher April 21, 2021 14:04

cjnolet approved these changes Apr 21, 2021

View reviewed changes

python/cuml/solvers/qn.pyx Outdated Show resolved Hide resolved

Note the difference to sklearn

d9c86fe

tfeher reviewed Apr 22, 2021

View reviewed changes

python/cuml/solvers/qn.pyx Show resolved Hide resolved

dantegd added the 4 - Waiting on Author Waiting for author to respond to review label Apr 22, 2021

Update documentation

bd96fda

achirkin added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Apr 23, 2021

achirkin requested a review from tfeher April 23, 2021 08:58

tfeher reviewed Apr 23, 2021

View reviewed changes

python/cuml/linear_model/logistic_regression.pyx Show resolved Hide resolved

tfeher approved these changes Apr 23, 2021

View reviewed changes

python/cuml/linear_model/logistic_regression.pyx Show resolved Hide resolved

tfeher added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Apr 23, 2021

rapids-bot bot merged commit 6584bbf into rapidsai:branch-0.20 Apr 23, 2021

achirkin mentioned this pull request Apr 27, 2021

[BUG] Logistic regression coefficients (for feature importance) significantly differ from Scikit-learn #3645

Closed

tfeher mentioned this pull request May 7, 2021

Accuracy issues in Logistic Regression with L1 penalty #1293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

achirkin commented Apr 19, 2021 •

edited

Loading

GPUtester commented Apr 19, 2021

GPUtester commented Apr 19, 2021

achirkin commented Apr 19, 2021

achirkin commented Apr 19, 2021

achirkin commented Apr 20, 2021

achirkin commented Apr 20, 2021

achirkin commented Apr 20, 2021

tfeher left a comment

cjnolet commented Apr 21, 2021

cjnolet left a comment

JohnZed commented Apr 22, 2021

codecov-commenter commented Apr 23, 2021

tfeher left a comment

tfeher commented Apr 23, 2021

Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

Conversation

achirkin commented Apr 19, 2021 • edited Loading

Background

GPUtester commented Apr 19, 2021

GPUtester commented Apr 19, 2021

achirkin commented Apr 19, 2021

achirkin commented Apr 19, 2021

achirkin commented Apr 20, 2021

achirkin commented Apr 20, 2021

achirkin commented Apr 20, 2021

tfeher left a comment

Choose a reason for hiding this comment

cjnolet commented Apr 21, 2021

cjnolet left a comment

Choose a reason for hiding this comment

JohnZed commented Apr 22, 2021

codecov-commenter commented Apr 23, 2021

Codecov Report

tfeher left a comment

Choose a reason for hiding this comment

tfeher commented Apr 23, 2021

achirkin commented Apr 19, 2021 •

edited

Loading