-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Least Angle Regression #3160
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also provide some perf numbers here for reference purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @teju85 for the comments, I have addressed the issues.
I have moved LARS to the experimental namespace. There are two known problems:
|
@teju85 Here are some preliminary perf numbers, measured (V100 vs Intel Xeon E5-2698). This was measured with default params for the regressor, using a synthetic dataset generated by make_regression. Note that LARS usually works with ncol x ncol Gram matrix therefore sckit-learn's solver is very fast for a large fraction of the parameter space. Our GPU solver improves on that and ensures that we remain fast even with a large number of rows or columns. |
…ith alignment check
Additionally adjust Gram matrix condition for precompute='auto'
Removed the waiting on author label since the two remaining issues that I am addressing are not affecting the Cython wrappers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over the cuh and pyx implementation code, looks good to me. Noted a couple typos / missing params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @drobison00 for the review, I have addressed the issues.
Additionally, collinearity detection was added to the cpp solver, and the related pytest are now enobled.
Because this is far along in review and going into the experimental namespace, I believe we can still push it for 0.17 (assuming no new issues emerge) despite being in burndown now. |
- Convert input to fp64 to avoid problem with fp32 input - Improved debug logs - Added cpp unit test with n_rows = 65536 - Avoid error during CUDA kernel calls if n_active == 0 - Correct indexing error for x_scale - Test normalize param - Move precomputed Gram wrapping to the main fit method
I believe the solver is fairly robust in fp64. The bug in fp32 mode (#3189) is resolved |
@tfeher rapidsai/raft#94 has been merged |
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #3160 +/- ##
================================================
+ Coverage 59.20% 71.27% +12.07%
================================================
Files 142 200 +58
Lines 8966 15923 +6957
================================================
+ Hits 5308 11349 +6041
- Misses 3658 4574 +916
Continue to review full report at Codecov.
|
rerun tests |
The cpp unit tests are incorrectly marked as passed (although the error is unrelated to LARS, the problem is with columnSort #3196). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have a few smaller suggestions (mostly little tests and a question). For an experimental module like this, I believe these can be handled in follow-ons, so I'm pre-approving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JohnZed for the review, I have fixed the issues!
This PR implements Least Angle Regression (LARS).
Lars is a model selection method, we select a number of features, for the prediction (controlled by the n_nonzero_coefs arg) and determine their regression coefficients.
The solver is implemented according to the paper by Efron, Hastie, Johnstone and Tibshirani.
This PR depends on RAFT PRs rapidsai/raft#94 and rapidsai/raft#95 and rapidsai/raft#102, and rapidsai/raft#103.