Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuDFInterface to work with cuDF GPU dataframes and cupy support for XArrayInterface #3982

Merged
merged 23 commits into from
Mar 4, 2020

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Sep 21, 2019

This PR adds a new data interface to allow HoloViews to work directly with cuDF GPU dataframes.

  • values
  • range
  • dframe
  • select
  • groupby (look at optimizing)
  • aggregate
  • iloc
  • add_dimension
  • sort
  • sample
  • concat

Datashader support:

  • Points/Scatter
  • Curve/Path/Area
  • QuadMesh: rectilinear and curvilinear

Other things to do:

  • Implement cuDatashader support once integrated with datashader itself

One major thing to figure out is how we will set up CI tests for the GPU.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Yes, it will be very important to be able to test this in CI...

holoviews/core/data/cudf.py Show resolved Hide resolved
holoviews/core/data/cudf.py Show resolved Hide resolved
holoviews/core/data/cudf.py Outdated Show resolved Hide resolved
holoviews/core/data/cudf.py Outdated Show resolved Hide resolved
holoviews/core/data/cudf.py Outdated Show resolved Hide resolved
@jonmmease
Copy link
Collaborator

Implement GPU optimized histogram operation (in this PR)

I think we could do this with cupy.histogram with something like this:

import cupy
import cudf
cdf = cudf.from_pandas(...)
ca = cupy.array(cdf["col1"].to_gpu_array(), copy=False)
hist, bin_edges = cupy.histogram(ca)

We'll probably also want to add a cupy interface for grid data eventually as well, so I'd be in favor of requiring that both cudf and cupy be installed for any GPU support.

@philippjfr
Copy link
Member Author

We'll probably also want to add a cupy interface for grid data eventually as well

This PR now also supports cupy backed xarrays using a two line change.

@philippjfr
Copy link
Member Author

Okay, this PR is now almost ready. I've added tests, which won't run on Travis of course but can be run locally. I'll add some additional tests for datashader integration.

There are one major class of tests that are failing which is groupby/aggregation tests. It seems like cuDF groupby(..., sort=False) results in reverse sorting which seems quite odd. I'll keep looking at that but in the worst case I'll disable those tests until the issue has been fixed upstream.

@philippjfr philippjfr changed the title Add cuDFInterface to work with cuDF GPU dataframes Add cuDFInterface to work with cuDF GPU dataframes and cupy support for XArrayInterface Feb 5, 2020
holoviews/core/data/cudf.py Outdated Show resolved Hide resolved
@philippjfr philippjfr merged commit 83490d7 into master Mar 4, 2020
@philippjfr philippjfr deleted the philippjfr/cuDF branch April 25, 2022 14:41
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants