Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Least Angle Regression #3160

Merged
merged 24 commits into from
Dec 1, 2020
Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Nov 19, 2020

This PR implements Least Angle Regression (LARS).

Lars is a model selection method, we select a number of features, for the prediction (controlled by the n_nonzero_coefs arg) and determine their regression coefficients.

The solver is implemented according to the paper by Efron, Hastie, Johnstone and Tibshirani.

This PR depends on RAFT PRs rapidsai/raft#94 and rapidsai/raft#95 and rapidsai/raft#102, and rapidsai/raft#103.

@tfeher tfeher requested review from a team as code owners November 19, 2020 15:08
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also provide some perf numbers here for reference purposes?

cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review CUDA / C++ CUDA issue Cython / Python Cython or Python issue New Algorithm For tracking new algorithms that will be added to our existing collection labels Nov 21, 2020
Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @teju85 for the comments, I have addressed the issues.

cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
@tfeher
Copy link
Contributor Author

tfeher commented Nov 23, 2020

I have moved LARS to the experimental namespace. There are two known problems:

@tfeher
Copy link
Contributor Author

tfeher commented Nov 24, 2020

@teju85 Here are some preliminary perf numbers, measured (V100 vs Intel Xeon E5-2698). This was measured with default params for the regressor, using a synthetic dataset generated by make_regression. Note that LARS usually works with ncol x ncol Gram matrix therefore sckit-learn's solver is very fast for a large fraction of the parameter space. Our GPU solver improves on that and ensures that we remain fast even with a large number of rows or columns.
lars_cpp_speedup

@tfeher tfeher removed the 4 - Waiting on Author Waiting for author to respond to review label Nov 24, 2020
@tfeher
Copy link
Contributor Author

tfeher commented Nov 24, 2020

Removed the waiting on author label since the two remaining issues that I am addressing are not affecting the Cython wrappers.

@JohnZed JohnZed added the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Nov 24, 2020
Copy link
Contributor

@drobison00 drobison00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over the cuh and pyx implementation code, looks good to me. Noted a couple typos / missing params.

python/cuml/linear_model/lars.pyx Outdated Show resolved Hide resolved
python/cuml/linear_model/lars.pyx Outdated Show resolved Hide resolved
python/cuml/linear_model/lars.pyx Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @drobison00 for the review, I have addressed the issues.

Additionally, collinearity detection was added to the cpp solver, and the related pytest are now enobled.

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
cpp/src/solver/lars_impl.cuh Show resolved Hide resolved
@JohnZed
Copy link
Contributor

JohnZed commented Nov 24, 2020

Because this is far along in review and going into the experimental namespace, I believe we can still push it for 0.17 (assuming no new issues emerge) despite being in burndown now.

- Convert input to fp64 to avoid problem with fp32 input
- Improved debug logs
- Added cpp unit test with n_rows = 65536
- Avoid error during CUDA kernel calls if n_active == 0
- Correct indexing error for x_scale
- Test normalize param
- Move precomputed Gram wrapping to the main fit method
@tfeher
Copy link
Contributor Author

tfeher commented Nov 25, 2020

I believe the solver is fairly robust in fp64. The bug in fp32 mode (#3189) is resolved avoided by automatically converting the data to fp64. I am not aware of any other issues. Will need to merge rapidsai/raft#94 before the CI can build this.

@dantegd
Copy link
Member

dantegd commented Nov 26, 2020

@tfeher rapidsai/raft#94 has been merged

@codecov-io
Copy link

codecov-io commented Nov 28, 2020

Codecov Report

Merging #3160 (85ddedb) into branch-0.17 (898f480) will increase coverage by 12.07%.
The diff coverage is 91.61%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-0.17    #3160       +/-   ##
================================================
+ Coverage        59.20%   71.27%   +12.07%     
================================================
  Files              142      200       +58     
  Lines             8966    15923     +6957     
================================================
+ Hits              5308    11349     +6041     
- Misses            3658     4574      +916     
Impacted Files Coverage Δ
python/cuml/cluster/__init__.py 100.00% <ø> (ø)
python/cuml/common/__init__.py 100.00% <ø> (ø)
...ython/cuml/dask/ensemble/randomforestclassifier.py 29.48% <ø> (ø)
python/cuml/dask/ensemble/randomforestregressor.py 34.54% <ø> (ø)
python/cuml/dask/metrics/__init__.py 80.00% <0.00%> (ø)
python/cuml/dask/naive_bayes/naive_bayes.py 42.10% <0.00%> (ø)
python/cuml/dask/neighbors/kneighbors_regressor.py 29.85% <0.00%> (-1.40%) ⬇️
python/cuml/dask/solvers/__init__.py 80.00% <0.00%> (ø)
python/cuml/metrics/pairwise_distances.pyx 98.83% <ø> (ø)
python/cuml/metrics/regression.pyx 95.45% <ø> (ø)
... and 160 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 397122e...85ddedb. Read the comment docs.

@cjnolet
Copy link
Member

cjnolet commented Nov 29, 2020

rerun tests

@tfeher
Copy link
Contributor Author

tfeher commented Nov 30, 2020

The cpp unit tests are incorrectly marked as passed (although the error is unrelated to LARS, the problem is with columnSort #3196).

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a few smaller suggestions (mostly little tests and a question). For an experimental module like this, I believe these can be handled in follow-ons, so I'm pre-approving.

python/cuml/experimental/linear_model/lars.pyx Outdated Show resolved Hide resolved
python/cuml/test/test_lars.py Show resolved Hide resolved
python/cuml/test/test_lars.py Outdated Show resolved Hide resolved
python/cuml/test/test_lars.py Show resolved Hide resolved
@tfeher tfeher added feature request New feature or request non-breaking Non-breaking change labels Dec 1, 2020
Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JohnZed for the review, I have fixed the issues!

python/cuml/experimental/linear_model/lars.pyx Outdated Show resolved Hide resolved
python/cuml/test/test_lars.py Show resolved Hide resolved
python/cuml/test/test_lars.py Show resolved Hide resolved
python/cuml/test/test_lars.py Outdated Show resolved Hide resolved
@JohnZed JohnZed merged commit da25d82 into rapidsai:branch-0.17 Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond CUDA / C++ CUDA issue Cython / Python Cython or Python issue feature request New feature or request New Algorithm For tracking new algorithms that will be added to our existing collection non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants