Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] 018 add unfitted error pca & tests on IPCA #3272

Merged
merged 8 commits into from
Dec 17, 2020

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Dec 7, 2020

This PR adds the NotFittedError to PCA (and IncrementalPCA) so that users are warned if they use transform or inverse_transform before fitting the model.

I also added tests on this exception.

@lowener lowener requested a review from a team as a code owner December 7, 2020 16:53
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@lowener lowener changed the title 018 add unfitted error pca [WIP] 018 add unfitted error pca Dec 7, 2020
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some minor comments since I was taking a peek, and a question/comment regarding generalizing the behavior to all our estimators

python/cuml/decomposition/pca.pyx Show resolved Hide resolved
python/cuml/test/test_incremental_pca.py Outdated Show resolved Hide resolved
@lowener lowener changed the title [WIP] 018 add unfitted error pca [REVIEW] 018 add unfitted error pca Dec 9, 2020
@lowener lowener changed the title [REVIEW] 018 add unfitted error pca [REVIEW] 018 add unfitted error pca & tests on IPCA Dec 9, 2020
@dantegd dantegd added 3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 9, 2020
@@ -554,6 +555,7 @@ class PCA(Base):

"""

self._check_is_fitted('components_')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in sklearn, they do sklearn.utils.validation.check_is_fitted in a similar way. Is there an advantage to having this as a method on base instead of a utility? I believe we can always add the function later as a wrapper if necessary, so probably not a major issue either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of a base method would be that there is no additionnal import needed, and it would be really easy to use. But then we would have two copy of this code, for the estimators with Dask.

@@ -736,3 +739,9 @@ class PCA(Base):
'X_types_gpu': ['2darray', 'sparse'],
'X_types': ['2darray', 'sparse']
}

def _check_is_fitted(self, attr):
if not hasattr(self, attr) or (getattr(self, attr) is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed on the generalization - good suggestion from Dante

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do the generalization of NotFittedError in a following PR

@codecov-io
Copy link

Codecov Report

Merging #3272 (8ddd2a9) into branch-0.18 (2e4388d) will increase coverage by 0.02%.
The diff coverage is 93.08%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #3272      +/-   ##
===============================================
+ Coverage        71.45%   71.48%   +0.02%     
===============================================
  Files              205      207       +2     
  Lines            16594    16747     +153     
===============================================
+ Hits             11858    11972     +114     
- Misses            4736     4775      +39     
Impacted Files Coverage Δ
python/cuml/neighbors/__init__.py 100.00% <ø> (ø)
python/cuml/neighbors/ann.pxd 87.05% <87.05%> (ø)
python/cuml/common/sparsefuncs.py 91.66% <88.23%> (-2.34%) ⬇️
python/cuml/linear_model/base.pyx 97.36% <97.36%> (ø)
python/cuml/neighbors/nearest_neighbors.pyx 92.36% <98.24%> (+1.90%) ⬆️
python/cuml/decomposition/pca.pyx 94.22% <100.00%> (+0.14%) ⬆️
...cuml/experimental/decomposition/incremental_pca.py 92.05% <100.00%> (+5.29%) ⬆️
python/cuml/linear_model/elastic_net.pyx 88.00% <100.00%> (-0.89%) ⬇️
python/cuml/linear_model/lasso.pyx 90.47% <100.00%> (-0.83%) ⬇️
python/cuml/linear_model/linear_regression.pyx 87.95% <100.00%> (-1.68%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b753d4...8ddd2a9. Read the comment docs.

@dantegd dantegd merged commit 47b6296 into rapidsai:branch-0.18 Dec 17, 2020
@lowener lowener deleted the 018_add_unfitted_error_pca branch January 10, 2021 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants