Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve QN solver stopping conditions (logistic regression) to match sklearn closer #3766

Merged
merged 7 commits into from
Apr 23, 2021

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Apr 19, 2021

Change the QN solver (logistic regression) stopping conditions to avoid early stops in some cases (#3645):

  • primary:
    || f' ||_inf <= fmag * param.epsilon
    
  • secondary:
    |f - f_prev| <= fmag * param.delta
    

where fmag = max(|f|, param.epsilon).

Also change the default value of tol in QN solver (which sets param.delta) to be consistent (1e-4) with the logistic regression solver.

Background

The original primary stopping condition is inconsistent with the sklearn reference implementation and is often triggered too early:

|| f' ||_2 <= param.epsilon * max(1.0, || x ||_2)

Here are the sklearn conditions for reference:

  • primary:
    || grad f ||_inf <= gtol
    
  • secondary:
    |f - f_prev| <= ftol * max(|f|, |f_prev|, 1.0)
    

where gtol is and exposed parameter like param.epsilon, and ftol = 2.2e-9 (hardcoded).
In addition, f in sklearn is scaled with the sample size (softmax or sigmoid over the dataset), so it's not exactly comparable to cuML version.

Currently, cuML checks the gradient w.r.t. the logistic regression weights x. As a result, the tolerance value goes up with the number of classes and features; the model stops too early and stays underfit. This may in part be a reason for #3645.
In this proposal I change the stopping condition to be closer to the sklearn version, but compromise the consistency with sklearn for better scaling (tolerance scales with the absolute values of the objective function). Without this scaling sklearn version seems to often run till the maximum iteration limit is reached.

@achirkin achirkin requested a review from a team as a code owner April 19, 2021 11:28
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

1 similar comment
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@achirkin
Copy link
Contributor Author

Perhaps, I'd like @tfeher have a look at this before.

@achirkin
Copy link
Contributor Author

FYI, the are a few more differences between cuML and sklearn, which haven't been addressed here directly:

  • sklearn's multinomial loss is a sum of softmax values of the data rows, cuML's loss is an average
  • the default tolerance is 1e-3 (cuML) vs 1e-4 (sklearn)
  • L-BFGS default memory parameter is 5 (cuML) vs 10 (sklearn)
  • the default max_iter 1000 (cuML) vs 100 (sklearn)

@raydouglass raydouglass added CUDA / C++ CUDA issue and removed CUDA/C++ labels Apr 19, 2021
@tfeher tfeher self-assigned this Apr 19, 2021
@achirkin achirkin changed the title Change the stopping condition to L-inf scaled with |fx| [WIP] Change the stopping condition to L-inf scaled with |fx| Apr 20, 2021
@achirkin
Copy link
Contributor Author

With our rather big default max_iter = 1000 and the new stopping conditions, the model may get stuck doing miniscule steps for hundreds of iterations. We've had params.delta to avoid that, but it's set to zero by default with no way to change it from the user site. It's may be worth changing in this PR too. Loss value/gradient:

issue-3645-m10-maxiter1000-tol4

The blue line is ours (visually not as bad as sklearn version, yet there is clearly a room for improvement :). The question is, whether there is an improvement in accuracy from doing iterations 200-600 and whether it's worth the time.

@achirkin achirkin marked this pull request as draft April 20, 2021 07:55
@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Apr 20, 2021
@achirkin achirkin changed the title [WIP] Change the stopping condition to L-inf scaled with |fx| [WIP] Improve QN solver stopping conditions (logistic regression) Apr 20, 2021
@achirkin
Copy link
Contributor Author

This one with params.delta enabled:

issue-3645-m5-maxiter1000-tol4-fixdelta

This also changed the minimum correlation(cuML weights, sklearn weights) from 0.998 to 0.97, while the accuracy did not change at all.

@tfeher tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 20, 2021
@achirkin
Copy link
Contributor Author

Removed the API-changing commits to add them in a separate PR.

@achirkin achirkin changed the title [WIP] Improve QN solver stopping conditions (logistic regression) Improve QN solver stopping conditions (logistic regression) to match sklearn closer Apr 20, 2021
@achirkin achirkin marked this pull request as ready for review April 20, 2021 12:50
@achirkin achirkin requested a review from a team as a code owner April 20, 2021 12:50
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for this PR! It looks good in general. The PR description should be updated by adding the secondary stopping condition from scipy. Let's discuss the details offline.

python/cuml/solvers/qn.pyx Show resolved Hide resolved
cpp/src/glm/qn/simple_mat.cuh Show resolved Hide resolved
@achirkin achirkin requested a review from tfeher April 21, 2021 14:04
@cjnolet
Copy link
Member

cjnolet commented Apr 21, 2021

@achirkin, thank you for working to help fix #3645 so quickly! This also help explain why I didn't see a whole lot of difference from increasing the max iterations. Overall, I think these changes sound good and I'm in the process of running your changes on the MRE.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor initial review feedback

python/cuml/solvers/qn.pyx Outdated Show resolved Hide resolved
@JohnZed
Copy link
Contributor

JohnZed commented Apr 22, 2021

add to allowlist

@dantegd dantegd added the 4 - Waiting on Author Waiting for author to respond to review label Apr 22, 2021
@achirkin achirkin added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Apr 23, 2021
@achirkin achirkin requested a review from tfeher April 23, 2021 08:58
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.20@a0cec3e). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.20    #3766   +/-   ##
==============================================
  Coverage               ?   86.04%           
==============================================
  Files                  ?      225           
  Lines                  ?    17117           
  Branches               ?        0           
==============================================
  Hits                   ?    14729           
  Misses                 ?     2388           
  Partials               ?        0           
Flag Coverage Δ
dask 49.28% <0.00%> (?)
non-dask 77.95% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0cec3e...bd96fda. Read the comment docs.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the updates, the PR looks good go!

@tfeher tfeher added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Apr 23, 2021
@tfeher
Copy link
Contributor

tfeher commented Apr 23, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6584bbf into rapidsai:branch-0.20 Apr 23, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…sklearn closer (rapidsai#3766)

Change the  QN solver (logistic regression) stopping conditions to avoid early stops in some cases (rapidsai#3645):
  - primary: 
    ```
    || f' ||_inf <= fmag * param.epsilon
    ```
  - secondary:
    ```
    |f - f_prev| <= fmag * param.delta
    ```
where `fmag = max(|f|, param.epsilon)`.

Also change the default value of `tol` in QN solver  (which sets `param.delta`) to be consistent (`1e-4`) with the logistic regression solver.


#### Background

The original primary stopping condition is inconsistent with the sklearn reference implementation and is often triggered too early:
```
|| f' ||_2 <= param.epsilon * max(1.0, || x ||_2)
```

Here are the sklearn conditions for reference:
  - primary: 
    ```
    || grad f ||_inf <= gtol
    ```
  - secondary:
    ```
    |f - f_prev| <= ftol * max(|f|, |f_prev|, 1.0)
    ```
where `gtol` is and exposed parameter like `param.epsilon`, and `ftol = 2.2e-9` (hardcoded).
In addition, `f` in sklearn is scaled with the sample size (softmax or sigmoid over the dataset), so it's not exactly comparable to cuML version.

Currently, cuML checks the gradient w.r.t. the logistic regression weights `x`. As a result, the tolerance value goes up with the number of classes and features; the model stops too early and stays underfit. This may in part be a reason for rapidsai#3645.
In this proposal I change the stopping condition to be closer to the sklearn version, but compromise the consistency with sklearn for better scaling (tolerance scales with the absolute values of the objective function). Without this scaling sklearn version seems to often run till the maximum iteration limit is reached.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#3766
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CUDA / C++ CUDA issue CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants