Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched Silhouette Score #3362

Merged
merged 19 commits into from
Feb 1, 2021

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Jan 12, 2021

closes #3255

Local tests show that batched SilhouetteScore implementation does as well or better than the full SilhouetteScore implementation when chunksize is equal to the number of rows in input matrix. Hence, I have removed the full SilhouetteScore from Cython for now.

I have also created #3368 to investigate the improvement of perf using shared memory. The timeline for this is TBD.

@codecov-io
Copy link

codecov-io commented Jan 12, 2021

Codecov Report

Merging #3362 (6f36518) into branch-0.18 (550121b) will increase coverage by 0.63%.
The diff coverage is 85.12%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #3362      +/-   ##
===============================================
+ Coverage        71.48%   72.12%   +0.63%     
===============================================
  Files              207      211       +4     
  Lines            16748    17229     +481     
===============================================
+ Hits             11973    12426     +453     
- Misses            4775     4803      +28     
Impacted Files Coverage Δ
python/cuml/decomposition/incremental_pca.py 94.70% <ø> (ø)
python/cuml/dask/ensemble/base.py 19.69% <30.43%> (+0.36%) ⬆️
python/cuml/dask/cluster/kmeans.py 54.00% <33.33%> (ø)
python/cuml/ensemble/randomforestregressor.pyx 70.83% <44.44%> (ø)
python/cuml/dask/decomposition/base.py 39.53% <50.00%> (ø)
...ython/cuml/dask/ensemble/randomforestclassifier.py 30.00% <50.00%> (+0.51%) ⬆️
python/cuml/dask/ensemble/randomforestregressor.py 35.08% <50.00%> (+0.54%) ⬆️
python/cuml/dask/linear_model/linear_regression.py 59.09% <50.00%> (ø)
python/cuml/dask/linear_model/ridge.py 50.00% <50.00%> (ø)
...ython/cuml/dask/neighbors/kneighbors_classifier.py 22.33% <50.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b5e7ff...6f36518. Read the comment docs.

@divyegala divyegala requested a review from a team as a code owner January 12, 2021 21:34
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jan 12, 2021
@divyegala divyegala added 3 - Ready for Review Ready for review by team breaking Breaking change CUDA / C++ CUDA issue feature request New feature or request Prim API change This issue/PR entails a prim API change that can affect current or in development algorithms labels Jan 13, 2021
@cjnolet
Copy link
Member

cjnolet commented Jan 14, 2021

@divyegala, I can take a look at this tomorrow

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. A couple very minor things.

cpp/include/cuml/metrics/metrics.hpp Outdated Show resolved Hide resolved
cpp/src_prims/metrics/batched/silhouette_score.cuh Outdated Show resolved Hide resolved
@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jan 15, 2021
@github-actions github-actions bot removed the CMake label Jan 19, 2021
@divyegala
Copy link
Member Author

@cjnolet this is ready for re-review. For the case chunksize=None, per @JohnZed 's suggestion, I have added the update that chunksize=40000 so that the computation goes through and does not fail for the user. This chunksize was chosen by me as what maximum chunksize computation passes through on a 16 GB V100.

@divyegala
Copy link
Member Author

rerun tests

@JohnZed
Copy link
Contributor

JohnZed commented Jan 28, 2021

rerun tests

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cjnolet
Copy link
Member

cjnolet commented Feb 1, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit fb1c810 into rapidsai:branch-0.18 Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review breaking Breaking change CUDA / C++ CUDA issue Cython / Python Cython or Python issue feature request New feature or request libcuml Prim API change This issue/PR entails a prim API change that can affect current or in development algorithms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Correct quadratic memory usage in silhouette_score and re-enable Python bindings
5 participants