[REVIEW] Least Angle Regression #3160

tfeher · 2020-11-19T15:08:55Z

This PR implements Least Angle Regression (LARS).

Lars is a model selection method, we select a number of features, for the prediction (controlled by the n_nonzero_coefs arg) and determine their regression coefficients.

The solver is implemented according to the paper by Efron, Hastie, Johnstone and Tibshirani.

This PR depends on RAFT PRs rapidsai/raft#94 and rapidsai/raft#95 and rapidsai/raft#102, and rapidsai/raft#103.

GPUtester · 2020-11-19T15:09:26Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

teju85

Can you also provide some perf numbers here for reference purposes?

cpp/src/solver/lars_impl.cuh

tfeher

Thanks @teju85 for the comments, I have addressed the issues.

cpp/src/solver/lars_impl.cuh

tfeher · 2020-11-23T14:16:35Z

I have moved LARS to the experimental namespace. There are two known problems:

Handling collinear input. I have added a python unit test for that and working on a fix. Scikit learn detects the problem using this check I will evaluate whether this is the best course for us. [Update: collinearity detection is implemented].
~~cuBLAS error~~ input data error in fp32 mode. The fp32 stress test is a reproducer on V100 CUDA 10.2. I will check with other library versions, whether the error exists. [Update: opened Issue [BUG] LARS solver in fp32 mode fails to fit due to NaNs in X #3189 to track this]

tfeher · 2020-11-24T13:52:42Z

@teju85 Here are some preliminary perf numbers, measured (V100 vs Intel Xeon E5-2698). This was measured with default params for the regressor, using a synthetic dataset generated by make_regression. Note that LARS usually works with ncol x ncol Gram matrix therefore sckit-learn's solver is very fast for a large fraction of the parameter space. Our GPU solver improves on that and ensures that we remain fast even with a large number of rows or columns.

…ith alignment check

Additionally adjust Gram matrix condition for precompute='auto'

tfeher · 2020-11-24T14:19:09Z

Removed the waiting on author label since the two remaining issues that I am addressing are not affecting the Cython wrappers.

drobison00

Went over the cuh and pyx implementation code, looks good to me. Noted a couple typos / missing params.

python/cuml/linear_model/lars.pyx

cpp/src/solver/lars_impl.cuh

tfeher

Thanks @drobison00 for the review, I have addressed the issues.

Additionally, collinearity detection was added to the cpp solver, and the related pytest are now enobled.

cpp/src/solver/lars_impl.cuh

JohnZed · 2020-11-24T23:34:10Z

Because this is far along in review and going into the experimental namespace, I believe we can still push it for 0.17 (assuming no new issues emerge) despite being in burndown now.

- Convert input to fp64 to avoid problem with fp32 input - Improved debug logs - Added cpp unit test with n_rows = 65536 - Avoid error during CUDA kernel calls if n_active == 0 - Correct indexing error for x_scale - Test normalize param - Move precomputed Gram wrapping to the main fit method

tfeher · 2020-11-25T21:29:34Z

I believe the solver is fairly robust in fp64. The bug in fp32 mode (#3189) is resolved ~~avoided by automatically converting the data to fp64~~. I am not aware of any other issues. Will need to merge rapidsai/raft#94 before the CI can build this.

dantegd · 2020-11-26T18:49:23Z

@tfeher rapidsai/raft#94 has been merged

codecov-io · 2020-11-28T12:35:11Z

Codecov Report

Merging #3160 (85ddedb) into branch-0.17 (898f480) will increase coverage by 12.07%.
The diff coverage is 91.61%.

@@               Coverage Diff                @@
##           branch-0.17    #3160       +/-   ##
================================================
+ Coverage        59.20%   71.27%   +12.07%     
================================================
  Files              142      200       +58     
  Lines             8966    15923     +6957     
================================================
+ Hits              5308    11349     +6041     
- Misses            3658     4574      +916

Impacted Files	Coverage Δ
python/cuml/cluster/__init__.py	`100.00% <ø> (ø)`
python/cuml/common/__init__.py	`100.00% <ø> (ø)`
...ython/cuml/dask/ensemble/randomforestclassifier.py	`29.48% <ø> (ø)`
python/cuml/dask/ensemble/randomforestregressor.py	`34.54% <ø> (ø)`
python/cuml/dask/metrics/__init__.py	`80.00% <0.00%> (ø)`
python/cuml/dask/naive_bayes/naive_bayes.py	`42.10% <0.00%> (ø)`
python/cuml/dask/neighbors/kneighbors_regressor.py	`29.85% <0.00%> (-1.40%)`	⬇️
python/cuml/dask/solvers/__init__.py	`80.00% <0.00%> (ø)`
python/cuml/metrics/pairwise_distances.pyx	`98.83% <ø> (ø)`
python/cuml/metrics/regression.pyx	`95.45% <ø> (ø)`
... and 160 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 397122e...85ddedb. Read the comment docs.

cjnolet · 2020-11-29T04:03:04Z

rerun tests

tfeher · 2020-11-30T18:01:50Z

The cpp unit tests are incorrectly marked as passed (although the error is unrelated to LARS, the problem is with columnSort #3196).

JohnZed

Looks good! I have a few smaller suggestions (mostly little tests and a question). For an experimental module like this, I believe these can be handled in follow-ons, so I'm pre-approving.

python/cuml/experimental/linear_model/lars.pyx

python/cuml/test/test_lars.py

tfeher

Thanks @JohnZed for the review, I have fixed the issues!

python/cuml/experimental/linear_model/lars.pyx

python/cuml/test/test_lars.py

tfeher requested review from a team as code owners November 19, 2020 15:08

tfeher force-pushed the fea-ext-lars branch from 95d2f0e to 59c7a67 Compare November 19, 2020 15:14

teju85 requested changes Nov 20, 2020

View reviewed changes

cpp/src/solver/lars_impl.cuh Show resolved Hide resolved

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved

dantegd added 4 - Waiting on Author Waiting for author to respond to review CUDA / C++ CUDA issue Cython / Python Cython or Python issue New Algorithm For tracking new algorithms that will be added to our existing collection labels Nov 21, 2020

tfeher commented Nov 23, 2020

View reviewed changes

cpp/src/solver/lars_impl.cuh Show resolved Hide resolved

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved

teju85 reviewed Nov 23, 2020

View reviewed changes

cpp/src/solver/lars_impl.cuh Outdated Show resolved Hide resolved

teju85 approved these changes Nov 23, 2020

View reviewed changes

tfeher mentioned this pull request Nov 23, 2020

[REVIEW] Epsilon parameter for Cholesky rank one update rapidsai/raft#103

Merged

tfeher added 10 commits November 24, 2020 15:11

FEA Lars solver

ca8cd5c

Fix scaling of X and alpha, use n_cols in calcMaxStep, use binaryOp w…

e4775ba

…ith alignment check

Improve memory error handling

cf9b5d6

Additionally adjust Gram matrix condition for precompute='auto'

DOC undo whitespace edit in changelog

ed49e6a

Add eps parameter and improve numeric error handling

3ff78c8

Fix include style

1ddcb66

Set coef_path[:,0] to zeros

3f19bd9

Add more extensive tests

fd596d0

Move LARS to experimantal namespace

4e64ba3

Remove unused imports

cd902d3

tfeher force-pushed the fea-ext-lars branch from 97c4537 to cd902d3 Compare November 24, 2020 14:12

tfeher removed the 4 - Waiting on Author Waiting for author to respond to review label Nov 24, 2020

JohnZed added the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Nov 24, 2020

drobison00 approved these changes Nov 24, 2020

View reviewed changes

tfeher added 2 commits November 24, 2020 21:44

FEA Detect and avoid collinear features

cea577d

DOC fix docstrings

f789ddc

tfeher commented Nov 24, 2020

View reviewed changes

tfeher added 5 commits November 27, 2020 08:00

Correct docstring and Python style

f363102

DOC Remove stray comma that triggered doxygen error

f0443e1

Update RAFT GIT_TAG

acdf7db

Correct __init__.py after moving LARS to experimental namespace

0bf55d4

Fix implicit type conversion error and enable FP32 training

bda29cc

Fix base parameter docs and get_param_names

7ba4b52

Merge branch 'branch-0.17' into fea-ext-lars

85ddedb

JohnZed approved these changes Dec 1, 2020

View reviewed changes

python/cuml/experimental/linear_model/lars.pyx Outdated Show resolved Hide resolved

python/cuml/test/test_lars.py Show resolved Hide resolved

python/cuml/test/test_lars.py Outdated Show resolved Hide resolved

python/cuml/test/test_lars.py Show resolved Hide resolved

tfeher added feature request New feature or request non-breaking Non-breaking change labels Dec 1, 2020

tfeher added 4 commits December 1, 2020 15:13

Define explicit dtype for the intercept attribute

cde674d

Improve Lars test coverage, and decrease test tolerance

fa4bf93

Merge branch 'branch-0.17' into fea-ext-lars

f43e22d

Merge remote-tracking branch 'tfeher/fea-ext-lars' into fea-ext-lars

b0d510b

tfeher commented Dec 1, 2020

View reviewed changes

python/cuml/experimental/linear_model/lars.pyx Outdated Show resolved Hide resolved

python/cuml/test/test_lars.py Show resolved Hide resolved

python/cuml/test/test_lars.py Show resolved Hide resolved

python/cuml/test/test_lars.py Outdated Show resolved Hide resolved

JohnZed merged commit da25d82 into rapidsai:branch-0.17 Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Least Angle Regression #3160

[REVIEW] Least Angle Regression #3160

tfeher commented Nov 19, 2020 •

edited

Loading

GPUtester commented Nov 19, 2020

teju85 left a comment

tfeher left a comment

tfeher commented Nov 23, 2020 •

edited

Loading

tfeher commented Nov 24, 2020

tfeher commented Nov 24, 2020

drobison00 left a comment

tfeher left a comment

JohnZed commented Nov 24, 2020

tfeher commented Nov 25, 2020 •

edited

Loading

dantegd commented Nov 26, 2020

codecov-io commented Nov 28, 2020 •

edited

Loading

cjnolet commented Nov 29, 2020

tfeher commented Nov 30, 2020

JohnZed left a comment

tfeher left a comment

[REVIEW] Least Angle Regression #3160

[REVIEW] Least Angle Regression #3160

Conversation

tfeher commented Nov 19, 2020 • edited Loading

GPUtester commented Nov 19, 2020

teju85 left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher commented Nov 23, 2020 • edited Loading

tfeher commented Nov 24, 2020

tfeher commented Nov 24, 2020

drobison00 left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

JohnZed commented Nov 24, 2020

tfeher commented Nov 25, 2020 • edited Loading

dantegd commented Nov 26, 2020

codecov-io commented Nov 28, 2020 • edited Loading

Codecov Report

cjnolet commented Nov 29, 2020

tfeher commented Nov 30, 2020

JohnZed left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher commented Nov 19, 2020 •

edited

Loading

tfeher commented Nov 23, 2020 •

edited

Loading

tfeher commented Nov 25, 2020 •

edited

Loading

codecov-io commented Nov 28, 2020 •

edited

Loading