Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Categorical Naive Bayes #4150

Merged
merged 17 commits into from
Sep 8, 2021

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Aug 6, 2021

This is a continuation of PR #1763, #4053, and #4079, to add Categorical Naive Bayes.
This is supposed to be merged after #4079.
Linking issue #1666.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Aug 6, 2021
@lowener lowener added feature request New feature or request non-breaking Non-breaking change labels Aug 6, 2021
@lowener lowener marked this pull request as ready for review August 12, 2021 16:13
@lowener lowener requested a review from a team as a code owner August 12, 2021 16:13
@lowener
Copy link
Contributor Author

lowener commented Aug 16, 2021

Here is a comparison of cuML and SKLearn performance on Categorical NB.
This is done using a synthetic dataset generated by make_classification with 4 classes.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz
cnb_4classes_custom_kernel_1

@lowener lowener added the 3 - Ready for Review Ready for review by team label Aug 16, 2021
@cjnolet
Copy link
Member

cjnolet commented Aug 25, 2021

@lowener can you include the steps taken for the timings in your benchmark chart? I'm mostly interested in whether these timings are only for training or if they also include the likelihoods.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementaiton looks great overall. This algorithm is immensely popular on sparse inputs / bigraphs, though, so we should strive to support sparse inputs and, as a result, assume a significantly large upper-bound on the number of features.

python/cuml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved
python/cuml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved
python/cuml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved
python/cuml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved
python/cuml/naive_bayes/naive_bayes.py Show resolved Hide resolved
@lowener
Copy link
Contributor Author

lowener commented Sep 7, 2021

I added support for sparse inputs and removed the loops over n_features.

For the benchmark previously posted I was only timing the fit() operation. Now the timing also include the predict():

CategoricalNB().fit(X, Y).predict(X)

And we can see that the removal of the loop over n_features greatly boosted the performances.

cnb_no_loop

@lowener lowener requested a review from cjnolet September 7, 2021 13:19
@lowener lowener added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Sep 7, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look really good and the benchmarks are super impressive. Just one little cleanup opportunity remains.

python/cuml/naive_bayes/naive_bayes.py Outdated Show resolved Hide resolved
@lowener lowener requested a review from cjnolet September 8, 2021 10:27
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0e770fa). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #4150   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18637           
  Branches                ?        0           
===============================================
  Hits                    ?    16042           
  Misses                  ?     2595           
  Partials                ?        0           
Flag Coverage Δ
dask 47.05% <0.00%> (?)
non-dask 78.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e770fa...af27bfa. Read the comment docs.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented Sep 8, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 496ddf0 into rapidsai:branch-21.10 Sep 8, 2021
@lowener lowener deleted the 21.10-categorical-nb branch September 17, 2021 16:10
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This is a continuation of PR rapidsai#1763, rapidsai#4053, and rapidsai#4079, to add Categorical Naive Bayes.
This is supposed to be merged after rapidsai#4079.
Linking issue rapidsai#1666.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants