Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support Naive Bayes variants #1666

Closed
2 of 4 tasks
cjnolet opened this issue Feb 12, 2020 · 0 comments · Fixed by #4595
Closed
2 of 4 tasks

[FEA] Support Naive Bayes variants #1666

cjnolet opened this issue Feb 12, 2020 · 0 comments · Fixed by #4595
Assignees
Labels
Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. feature request New feature or request

Comments

@cjnolet
Copy link
Member

cjnolet commented Feb 12, 2020

There are 4 different variants of Naive Bayes in Scikit-learn:

  • Multinomial
  • Gaussian
  • Bernoulli
  • Complement

Between experimentation with CuPy RawKernel and abstracting it for type agnosticism, creating a new directory for CuPy/Python-based prims, and initial multinomial Naive Bayes implementation, the Naive Bayes PR (#1375) has become quite large and needs to be merged.

The primary primitive in the multinomial Naive Bayes variant, a custom RawKernel that uses shared memory and atomicAdd to count features for each class, also supports squaring the sums so that it can be used to extract a mean and variance for the Gaussian variant. The remaining variants of the algorithm should also be able to make use of this primitive.

Given the infrastructure provided by #1375, adding these variants should be straightforward and moderate to trivial. The distributed variants should just be able to proxy to the single-GPU classes, combining the underlying parameters in the same way as the existing multinomial version.

@cjnolet cjnolet added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 12, 2020
@cjnolet cjnolet added Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. and removed ? - Needs Triage Need team to review and classify labels Feb 12, 2020
rapids-bot bot pushed a commit that referenced this issue Jul 22, 2021
This is a continuation of PR #1763, to add Multinomial and Bernoulli NB variants.
The Gaussian and Categorical variants will be added in a following PR.

Also linking issue #1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4053
rapids-bot bot pushed a commit that referenced this issue Aug 9, 2021
This is a continuation of PR #1763 and #4053, to add Gaussian Naive Bayes.
This is supposed to be merged after #4053 

Here is a comparison of cuML and SKLearn performance on Gaussian NB.
This is done using a synthetic dataset generated by make_regression.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz
![gaussian](https://user-images.githubusercontent.com/9810050/126572439-8982faa8-5ad1-4bca-91ab-76704050bf33.png)

Linking issue #1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4079
rapids-bot bot pushed a commit that referenced this issue Sep 8, 2021
This is a continuation of PR #1763, #4053, and #4079, to add Categorical Naive Bayes.
This is supposed to be merged after #4079.
Linking issue #1666.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4150
@rapids-bot rapids-bot bot closed this as completed in #4595 Mar 7, 2022
rapids-bot bot pushed a commit that referenced this issue Mar 7, 2022
Closes #1666.
The implementation of this variant is straightforward and matches sklearn.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4595
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
This is a continuation of PR rapidsai#1763, to add Multinomial and Bernoulli NB variants.
The Gaussian and Categorical variants will be added in a following PR.

Also linking issue rapidsai#1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4053
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
This is a continuation of PR rapidsai#1763 and rapidsai#4053, to add Gaussian Naive Bayes.
This is supposed to be merged after rapidsai#4053 

Here is a comparison of cuML and SKLearn performance on Gaussian NB.
This is done using a synthetic dataset generated by make_regression.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz
![gaussian](https://user-images.githubusercontent.com/9810050/126572439-8982faa8-5ad1-4bca-91ab-76704050bf33.png)

Linking issue rapidsai#1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4079
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
This is a continuation of PR rapidsai#1763, rapidsai#4053, and rapidsai#4079, to add Categorical Naive Bayes.
This is supposed to be merged after rapidsai#4079.
Linking issue rapidsai#1666.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4150
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Closes rapidsai#1666.
The implementation of this variant is straightforward and matches sklearn.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants