Replace use of custom CUDA bindings with CUDA-Python #930

shwina · 2021-12-03T15:41:46Z

This PR removes many of the custom CUDA bindings we wrote in RMM to support calls to the driver/runtime APIs from Python in downstream libraries (cudf, cuml, cugraph). We should now use CUDA Python instead.

However, the module rmm._cuda.gpu is not being removed. It has been converted from an extension module (.pyx) to a regular .py module. This module contains high-level wrappers around raw CUDA bindings, with some niceties like converting errors to exceptions with the appropriate error message. Reimplementing that functionality in each downstream library would be a bad idea. When CUDA Python rolls its own higher-level API, we can remove the gpu module as well.

One API change worth mentioning here is to the function rmm._cuda.gpu.getDeviceAttribute. Previously, the API accepted a cudaDeviceAttr, a type defined as part of RMM's custom CUDA bindings. The API has now changed to accept a cudaDeviceAttr defined in CUDA-Python. This requires changes in downstream libraries that use this API.

I am marking this PR non-breaking as it does not affect the user-facing API. It does cause breakages in downstream libraries that are currently relying on internal APIs (from the rmm._cuda module).

jakirkham · 2021-12-03T16:32:17Z

Seeing all of the code dropped here is great! Thanks for working on this Ashwin! 😄

bdice

This is awesome @shwina! Thanks for doing this. I reviewed this primarily to learn more about RMM's Cython API, which I haven't looked at much before. I have one suggestion and one question attached below.

python/rmm/_cuda/gpu.py

harrism · 2021-12-06T20:58:14Z

This is so cool. RMM is getting leaner!

harrism · 2021-12-06T20:58:42Z

Is this breaking or non-breaking?

shwina · 2021-12-06T21:48:28Z

I'm marking it non-breaking, as the only changes are to non-public APIs. However, this will break cudf as it consumes non-public RMM APIs.

As part of this PR, I'm going to remove those APIs entirely from RMM (rmm._cuda.gpu), and instead have cudf also use CUDA Python directly.

python/rmm/_lib/memory_resource.pxd

jakirkham · 2021-12-06T21:52:03Z

Also should we be adding cuda-python to install_requires in setup.py as well as in the Conda recipe and environments?

Co-authored-by: Bradley Dice <[email protected]>

shwina · 2021-12-06T21:57:43Z

Thanks @jakirkham - I just did that. Could you please double check that the constraints are appropriate?

bdice · 2021-12-06T22:23:56Z

Also should we be adding coda-python to install_requires in setup.py as well as in the Conda recipe and environments?

Also python/dev_requirements.txt. Also, the install_requires are currently unpinned but the setup_requires is pinned, which strikes me as unexpected.

Is it intentional that we have pinnings in pyproject.toml and setup.py's setup_requires for Cython? I think we only need the pyproject.toml build system but am not sure.

jakirkham · 2021-12-06T23:03:51Z

Thanks @jakirkham - I just did that. Could you please double check that the constraints are appropriate?

Hmm...I'm not seeing them. Did they get pushed?

…-python

conda/environments/rmm_dev_cuda10.1.yml

conda/recipes/rmm/meta.yaml

python/setup.py

conda/environments/rmm_dev_cuda10.2.yml

conda/environments/rmm_dev_cuda11.0.yml

conda/environments/rmm_dev_cuda10.1.yml

Co-authored-by: Mark Harris <[email protected]>

python/rmm/_cuda/gpu.py

bdice · 2022-01-14T23:15:52Z

python/rmm/_cuda/gpu.py

+        err, name = cudart.cudaGetErrorName(status)
+        if err != cudart.cudaError_t.cudaSuccess:
+            raise CUDARuntimeError(err.value)


It's awkward that we might raise a CUDARuntimeError from within the constructor of CUDARuntimeError... which got me to looking a little closer.

I saw the suggestion from @jakirkham in #930 (comment) / 8be6c0b but I'm pretty sure this is a spurious error check that illuminates some slightly awkward design in cuda-python (a meaningless err value).

For cudaGetErrorName, the err value is hardcoded as cudaError_t.cudaSuccess. It's not likely that this will ever change, because the corresponding runtime API does not generate an CUDA error: cudaGetErrorName returns a const char* with a special value of "unrecognized error code" rather than setting an error code if the parameters are invalid.
https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/cudart.pyx#L8084

Similarly for cudaGetErrorString... sort of:
https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/cudart.pyx#L8115
This function actually can set an error code via its call to the driver API cuGetErrorString, but that error is not returned -- the cudaGetErrorString in cuda-python also has err hard-coded as a success and the value "unrecognized error code" is returned. https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L522-L531

In summary, I think it would be safe to revert 8be6c0b and use the previous snippet with a throwaway variable _, name = ... for both cudaGetErrorName and cudaGetErrorString because the err is always a success. As a consequence, we don't need to handle this awkward possibility of raising an error class in its own constructor.

I agree that it is awkward. Would hope this error can't come up, but it might come if the error code is new. IOW only shows up in later CUDA versions, but not earlier ones. Or it could show up if the error code is malformed.

Maybe a middle ground would be to raise a RuntimeError so we don't have a recursive CUDARuntimeError definition. Thoughts?

If a new or malformed error code is provided, the returned string from cudaGetErrorName or cudaGetErrorString will say "unrecognized error code" but no CUDA error will be set. Thus we do not need to check for a CUDA error from those functions. The cuda-python decision to hard-code the err value as "success" strikes me as odd because no err should be returned at all.

If we were to need to raise an error, the recursive use of CUDARuntimeError is not problematic/incorrect -- just awkward.

However, I don't have strong enough feelings on this to hold back on merging.

Perhaps we should check for that build string and then error? Having suppressed errors doesn't sound good.

Agree with your assessment CUDA-Python should be handling this better, but maybe we can have that discussion in a new CUDA-Python issue?

That sounds like a good idea! Raise a RuntimeError if we get the unrecognized error code string back, but don’t check the value of err:

Suggested change

err, name = cudart.cudaGetErrorName(status)

if err != cudart.cudaError_t.cudaSuccess:

raise CUDARuntimeError(err.value)

_, name = cudart.cudaGetErrorName(status)

if name == "unrecognized error code":

raise RuntimeError(name)

(and similarly for cudaGetErrorString)

After a brief sync offline with Bradley, we decided to just not worry about handling an unrecognized error code, because we couldn't figure out how such an error code could be constructed to begin with. cudaError_t is an enum that cannot be constructed from arbitrary integers.

python/rmm/_cuda/gpu.py

Co-authored-by: Bradley Dice <[email protected]>

bdice

LGTM! Thanks @shwina!

jakirkham · 2022-01-18T18:37:44Z

rerun tests

harrism

It's great to simplify. I will leave the Python review up to Python experts -- what would you like me to review?

python/rmm/_cuda/gpu.py

jakirkham · 2022-01-19T04:35:28Z

I think you had some review comments above. So was curious if those were addressed from your perspective

If anything else pops out, please let us know :)

harrism · 2022-01-19T04:37:11Z

I think you had some review comments above. So was curious if those were addressed from your perspective

If anything else pops out, please let us know :)

Just the one comment about why we are keeping some cudart functions in RMM.

shwina · 2022-01-19T15:28:06Z

@gpucibot merge

… CUDA Python bindings (#10008) This PR replaces custom CUDA bindings that are provided by RMM, with official CUDA Python bindings. This PR should be merged after the RMM PR rapidsai/rmm#930 Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #10008

…451) As a follow up to rapidsai/rmm#930, fix RAFT to rely on CUDA Python directly rather than custom CUDA bindings that RMM provided. Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Jordan Jacobelli (https://github.com/Ethyling) URL: #451

Please do not merge until rapidsai/rmm#930 is merged. For the reasons described in that PR, this API has changed to accept a `cuda.cudart.cudaDeviceAttr` object, `cuda` being the official CUDA Python bindings package. Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Brad Rees (https://github.com/BradReesWork) - Rick Ratzel (https://github.com/rlratzel) - AJ Schmidt (https://github.com/ajschmidt8) URL: #2008

Replace use of custom CUDA bindings with CUDA-Python

f407c40

shwina requested a review from a team as a code owner December 3, 2021 15:41

github-actions bot added the Python Related to RMM Python API label Dec 3, 2021

shwina mentioned this pull request Dec 3, 2021

Add cuda-python rapidsai/integration#401

Merged

shwina marked this pull request as draft December 3, 2021 16:06

bdice requested changes Dec 6, 2021

View reviewed changes

python/rmm/_cuda/gpu.py Show resolved Hide resolved

python/rmm/_cuda/gpu.py Outdated Show resolved Hide resolved

harrism added the improvement Improvement / enhancement to an existing function label Dec 6, 2021

jakirkham reviewed Dec 6, 2021

View reviewed changes

python/rmm/_lib/memory_resource.pxd Outdated Show resolved Hide resolved

shwina and others added 3 commits December 6, 2021 16:52

Undo deletion

9f564ba

Update python/rmm/_cuda/gpu.py

c873b89

Co-authored-by: Bradley Dice <[email protected]>

Add cuda-python to recipe, envs and setup.py

7e14c23

shwina added the non-breaking Non-breaking change label Dec 6, 2021

Merge branch 'use-cuda-python' of github.com:shwina/rmm into use-cuda…

21920ac

…-python

github-actions bot added the conda label Dec 7, 2021

bdice reviewed Dec 7, 2021

View reviewed changes

conda/environments/rmm_dev_cuda10.1.yml Outdated Show resolved Hide resolved

conda/recipes/rmm/meta.yaml Outdated Show resolved Hide resolved

python/setup.py Show resolved Hide resolved

Small fixes

54659cc

harrism reviewed Dec 8, 2021

View reviewed changes

conda/environments/rmm_dev_cuda10.2.yml Outdated Show resolved Hide resolved

harrism reviewed Dec 8, 2021

View reviewed changes

conda/environments/rmm_dev_cuda11.0.yml Outdated Show resolved Hide resolved

harrism reviewed Dec 8, 2021

View reviewed changes

conda/environments/rmm_dev_cuda10.1.yml Outdated Show resolved Hide resolved

shwina and others added 2 commits December 8, 2021 07:43

Update conda/environments/rmm_dev_cuda11.0.yml

b07e7f6

Co-authored-by: Mark Harris <[email protected]>

Fix CUDA Python version in meta.yaml

afc75c2

bdice reviewed Jan 11, 2022

View reviewed changes

python/rmm/_cuda/gpu.py Outdated Show resolved Hide resolved

jakirkham reviewed Jan 11, 2022

View reviewed changes

python/rmm/_cuda/gpu.py Show resolved Hide resolved

shwina added 2 commits January 13, 2022 07:20

Fix type of arg

84994cb

Check status after calls to cudaGetErrorName/String

8be6c0b

shwina removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jan 14, 2022

shwina requested review from jakirkham and bdice January 14, 2022 18:42

bdice requested changes Jan 14, 2022

View reviewed changes

shwina and others added 4 commits January 18, 2022 10:34

Update python/rmm/_cuda/gpu.py

73d3181

Co-authored-by: Bradley Dice <[email protected]>

Update python/rmm/_cuda/gpu.py

ee80f22

Co-authored-by: Bradley Dice <[email protected]>

Avoid recursion in CUDADriverError and CUDARuntimeError

0ee073c

Just don't worry about it.

8e4b680

bdice approved these changes Jan 18, 2022

View reviewed changes

jakirkham approved these changes Jan 18, 2022

View reviewed changes

jakirkham requested review from harrism and vyasr January 18, 2022 22:57

harrism reviewed Jan 19, 2022

View reviewed changes

python/rmm/_cuda/gpu.py Show resolved Hide resolved

rapids-bot bot merged commit d94bdfd into rapidsai:branch-22.02 Jan 19, 2022

shwina mentioned this pull request Jan 19, 2022

Replace RMM CUDA Python bindings with those provided by CUDA-Python rapidsai/raft#451

Merged

charlesbluca mentioned this pull request Apr 20, 2022

UserWarning: A CUDA context for device 0 already exists rapidsai/dask-cuda#867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace use of custom CUDA bindings with CUDA-Python #930

Replace use of custom CUDA bindings with CUDA-Python #930

shwina commented Dec 3, 2021 •

edited

Loading

jakirkham commented Dec 3, 2021

bdice left a comment

harrism commented Dec 6, 2021

harrism commented Dec 6, 2021

shwina commented Dec 6, 2021

jakirkham commented Dec 6, 2021 •

edited

Loading

shwina commented Dec 6, 2021

bdice commented Dec 6, 2021 •

edited

Loading

jakirkham commented Dec 6, 2021

bdice Jan 14, 2022 •

edited

Loading

jakirkham Jan 15, 2022

bdice Jan 15, 2022

jakirkham Jan 15, 2022

bdice Jan 15, 2022 •

edited

Loading

shwina Jan 18, 2022

shwina Jan 18, 2022 •

edited

Loading

bdice left a comment

jakirkham commented Jan 18, 2022

harrism left a comment

jakirkham commented Jan 19, 2022

harrism commented Jan 19, 2022

shwina commented Jan 19, 2022

Replace use of custom CUDA bindings with CUDA-Python #930

Replace use of custom CUDA bindings with CUDA-Python #930

Conversation

shwina commented Dec 3, 2021 • edited Loading

jakirkham commented Dec 3, 2021

bdice left a comment

Choose a reason for hiding this comment

harrism commented Dec 6, 2021

harrism commented Dec 6, 2021

shwina commented Dec 6, 2021

jakirkham commented Dec 6, 2021 • edited Loading

shwina commented Dec 6, 2021

bdice commented Dec 6, 2021 • edited Loading

jakirkham commented Dec 6, 2021

bdice Jan 14, 2022 • edited Loading

Choose a reason for hiding this comment

jakirkham Jan 15, 2022

Choose a reason for hiding this comment

bdice Jan 15, 2022

Choose a reason for hiding this comment

jakirkham Jan 15, 2022

Choose a reason for hiding this comment

bdice Jan 15, 2022 • edited Loading

Choose a reason for hiding this comment

shwina Jan 18, 2022

Choose a reason for hiding this comment

shwina Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

jakirkham commented Jan 18, 2022

harrism left a comment

Choose a reason for hiding this comment

jakirkham commented Jan 19, 2022

harrism commented Jan 19, 2022

shwina commented Jan 19, 2022

shwina commented Dec 3, 2021 •

edited

Loading

jakirkham commented Dec 6, 2021 •

edited

Loading

bdice commented Dec 6, 2021 •

edited

Loading

bdice Jan 14, 2022 •

edited

Loading

bdice Jan 15, 2022 •

edited

Loading

shwina Jan 18, 2022 •

edited

Loading