cuDF numba cuda 12 updates #13337

brandon-b-miller · 2023-05-11T12:25:48Z

Summary of changes:

Removed some old code that is only used for numba<0.54 which hasn't been supported for a while now.
Removed some old code that is only used when cubinlinker is not present, which is has been a hard requirement for a while now as well.
Created a file _numba.py and moved into this file all of the machinery used to configure numba upon cuDF import. This includes functions for determining which toolkit version was used to build the PTX file our UDFs rely on as well as the functions for potentially putting numba into MVC mode if necessary.
Created a file _ptxcompiler.py which vendors the driver/runtime version checking machinery from ptxcompiler in case we're in a cuda 12 environment that doesn't have it
Changed the code to issue a warning in cuda 12+ MVC situations that the library will likely not work
The version of the toolkit used to determine if MVC is required is now determined from the cc=60 PTX file which is always built. This is to avoid needing to query the device compute capability through numba's cuda module. This needs to be avoided during numba's setup because if numba.cuda is imported before numba's config is modified, the config options will have no effect.

…ng driver

brandon-b-miller · 2023-05-11T12:26:24Z

python/cudf/cudf/utils/_numba_setup.py

+
+from numba import config
+
+ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"


Ideally there's a better way of doing this rather than relying on a relative path.

A relative path is appropriate but use os.path.join and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?

brandon-b-miller · 2023-05-11T12:28:01Z

python/cudf/cudf/utils/_numba_setup.py

+        # cc=60 ptx is always built
+        cc = int(os.environ.get("STRINGS_UDF_CC", "60"))
+    else:
+        from numba import cuda


Runtime imports can be expensive but importing cuda must be avoided as a side effect of importing the module in which this function resides. Another option could be passing the cuda module as an optional argument here.

bdice

Thanks for doing this. I have attached comments.

dependencies.yaml

python/cudf/cudf/__init__.py

bdice · 2023-05-15T12:48:30Z

python/cudf/cudf/utils/_numba_setup.py

+
+from numba import config
+
+ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"


A relative path is appropriate but use os.path.join and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?

python/cudf/cudf/utils/_numba_setup.py

bdice · 2023-05-15T12:59:47Z

python/cudf/cudf/utils/_numba_setup.py

+        sm_number = file_name.rstrip(".ptx").lstrip(prefix)
+        if sm_number.endswith("a"):
+            processed_sm_number = int(sm_number.rstrip("a"))
+            if processed_sm_number == cc:


I’m confused why we follow a different logical path for compute capabilities ending in “a.” They’re not fundamentally different, just differently named.

IIRC, this logic accounts for a multitude of different build configurations for both CI and local builds from source. I am happy to follow up on this but given that this function is only being moved I'm hesitant to anything that could perturb it as part of this PR.

python/cudf/cudf/utils/_numba_setup.py

bdice · 2023-05-15T13:04:38Z

python/cudf/cudf/utils/_numba_setup.py

+    driver_version, runtime_version, ptx_toolkit_version
+):
+    # Numba thinks cubinlinker is only needed if the driver is older than
+    # the ctk, but when PTX files are present, it might also need to patch


Just wanting to be precise in our comments here. Do we specifically mean the runtime rather than the “toolkit”?

Suggested change

# the ctk, but when PTX files are present, it might also need to patch

# the CUDA runtime, but when PTX files are present, it might also need to patch

The toolkit component of concern is NVVM. We only use the runtime to check what version NVVM is (assuming if we find the runtime, that it's the same version as NVVM) because NVVM provides no way to check what version it is. I find it less confusing to refer to "the toolkit" as it's the collection of components including the runtime and NVVM, amongst other things.

python/cudf/cudf/utils/_numba_setup.py

bdice · 2023-05-15T13:08:39Z

python/cudf/cudf/utils/_numba_setup.py

+        "7.6": (11, 6),
+        "7.7": (11, 7),
+        "7.8": (11, 8),
+        "8.0": (12, 0),


Do we need to expand this to include 12.1? Users could run CUDA 12 pip wheels on 12.1.

Should we try to fall back to parsing the line “Cuda compilation tools, release 11.6” if this mapping fails? I know it might be less safe than the current solution but I wish we could use that to eliminate the need for a specific mapping that must be updated with CUDA releases.

I agree with this sentiment. Similarly to above however I think it's better to tackle this particular update separately as this function is a move. That said the change should be simple and I can commit to having it in for this release.

python/cudf/cudf/utils/_numba_setup.py

Co-authored-by: Bradley Dice <[email protected]>

…er/cudf into cudf-numba-cuda12-updates

… globally now

python/cudf/cudf/__init__.py

python/cudf/cudf/core/udf/utils.py

bdice · 2023-05-22T18:52:43Z

python/cudf/cudf/utils/_numba.py

+CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"
+_NO_DRIVER = (math.inf, math.inf)
+
+CMD = """\


Use a descriptive name.

Suggested change

CMD = """\

CHECK_CUDA_VERSION_CMD = """\

python/cudf/cudf/utils/_numba.py

bdice · 2023-05-22T18:59:06Z

python/cudf/cudf/utils/_numba.py

+    This function is mostly vendored from ptxcompiler and is used
+    to check the system CUDA driver and runtime versions in its absence.
+    """
+    cp = subprocess.run([sys.executable, "-c", CMD], capture_output=True)


Does ptxcompiler use a subprocess currently? I am surprised by this because this tends to be costly. If possible, we should avoid launching a subprocess at cudf import time.

Is there a way to use any other tool (cuda-python, rmm, ...) to know this version information? I assume the goal of using a subprocess is to avoid importing numba.cuda in this process...

Yes, ptxcompiler uses a subprocess. This command is vendored directly from it here. While doing all of our setup before importing numba.cuda is one goal, another requirement is that the process for getting the versions doesn't cuinit, otherwise it can interfere with the way dask initializes its networking. More generally we need to avoid cuinit fully during import.

As an aside it would be nice if we had some kind of small independent package we could list as a dependency that served as a source of truth for the driver and runtime versions, which we could also configure to avoid forking a subprocess, calling cuinit, etc. This would avoid a bunch of reinventing the wheel across cuda-python, numba, rmm, etc....

I believe cuda-python is supposed to be just this, but in practice we’ve seen various issues (not providing the right runtime version, among others). I am not sure but recent cuda-python versions may be better at this?

I tried putting a

from cuda import cuda, cudart cuda.cuDriverGetVersion() cudart.cudaRuntimeGetVersion()

At the top of cudf's __init__.py and ran test_no_cuinit.py, which passed. So I suppose it's a clean approach from the cuInit perspective. Happy to refactor as such here and see if things work.

RMM does seem to list some kind of "limitation" here and falls back to numba, but I'm not sure what it means exactly
https://github.com/rapidsai/rmm/blob/branch-23.06/python/rmm/_cuda/gpu.py#L81-L85

Think this is referring to issue ( NVIDIA/cuda-python#16 )

Namely the version cuda-python returns is hard-coded at build time as opposed to querying it at run time. So the version it returns may not reflect what the user has installed on their system (and will instead reflect how the binary was built). The issue explains why this is happening currently (though it still presents an issue with intended usage)

There's more details in the threads in PR ( rapidsai/rmm#946 ) as well

To close the loop on this I decided in 5cb0ce6 to just vendor the three functions to _ptxcompiler.py.

To summarize the way it works now, the user either has ptxcompiler or doesn't. If they do, use it to get the versions (this launches a subprocess). If they don't, use the vendored functions to do so (this also launches the same subprocess using the same command).

We are constrained to use a subprocess for three reasons:

We can not call cuinit on cudf's import without interfering with dask.

We can not directly use numba.cuda to obtain the versions because we need to finish configuring numba before numba.cuda is imported.

We can not utilize cuda.cuda due to constraint described above

Even with the approach above we need to leave in place a mechanism to disable the spawning of a subprocess in HPC environments where it is not safe to do so. In this case the user may provide an environment variable which disables the check and two more environment variables specifying the driver and runtime versions manually, and these are parsed and returned instead. These environment variables are the same ones, between ptxcompiler and our vendored version.

brandon-b-miller · 2023-05-23T03:28:22Z

I moved a little of the logic around here while vendoring the few extra pieces of ptxcompiler I think we need to keep things safe to _ptxcompiler.py. I think the way the logic reads is a lot simpler now and I'm hoping one more round of review should be enough to merge this.

python/cudf/cudf/utils/_numba.py

Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Graham Markall <[email protected]>

bdice · 2023-05-23T15:25:59Z

python/cudf/cudf/utils/_numba.py

+
+from numba import config
+
+CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"


This needs to use os.path.join.

ajschmidt8

Approving ops-codeowner file changes

bdice · 2023-05-23T15:44:03Z

python/cudf/cudf/utils/_ptxcompiler.py

+
+NO_DRIVER = (math.inf, math.inf)
+
+CMD = """\


Can we name this a proper name? NUMBA_CHECK_VERSION_CMD or similar.

bdice

Looks fine to me. I discussed with @brandon-b-miller and I feel like I understand more of the constraints. I am not sure that we have the best solution but I'm not sure how to improve it at this point.

I requested that we run a manual test, installing the CUDA 12 wheel from the CI artifacts on a system with driver 12.0 and a Docker image with runtime 12.1. This is supposed to raise a warning, I think, and I'd like to see that warning occur in manual testing (we don't have a CI configuration where this can be tested). @brandon-b-miller Please report back the result or let me know if you need help doing this.

brandon-b-miller · 2023-05-23T17:07:54Z

Looks fine to me. I discussed with @brandon-b-miller and I feel like I understand more of the constraints. I am not sure that we have the best solution but I'm not sure how to improve it at this point.

I requested that we run a manual test, installing the CUDA 12 wheel from the CI artifacts on a system with driver 12.0 and a Docker image with runtime 12.1. This is supposed to raise a warning, I think, and I'd like to see that warning occur in manual testing (we don't have a CI configuration where this can be tested). @brandon-b-miller Please report back the result or let me know if you need help doing this.

I have tested this with a bare-metal machine with

NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0

And the 12.1 toolkit, verified with nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

With this I obtain the warning I expect

UserWarning: Using CUDA toolkit version (12, 1) with CUDA driver version (12, 0) requires minor version compatibility, which is not yet supported for CUDA driver versions 12.0 and above. It is likely that many cuDF operations will not work in this state. Please install CUDA toolkit version (12, 0) to continue using cuDF

As well as the error I expect when performing an operation that requires a numba kernel such as

>>> cudf.Series([1,2,3])[0]

From which I get

ptxas application ptx input, line 9; fatal   : Unsupported .version 8.1; current version is '8.0'

So things check out from my end.

gmarkall

This looks good. The vendoring of small parts of the ptxcompiler patch module look fine.

brandon-b-miller · 2023-05-23T20:50:36Z

/merge

shwina · 2023-05-23T20:52:01Z

/merge

jakirkham · 2023-05-23T20:59:32Z

Thanks all! 🙏

Really impressive work! 👏

brandon-b-miller added 4 commits May 11, 2023 04:35

move functions, use config option to enable mvc, do so before importi…

76109ce

…ng driver

move more of numbas setup to _numba_setup

de2b678

update comment in __init__

442fefc

add a few docs

f5f915d

brandon-b-miller added feature request New feature or request numba Numba issue non-breaking Non-breaking change python labels May 11, 2023

brandon-b-miller self-assigned this May 11, 2023

github-actions bot added the Python Affects Python cuDF API. label May 11, 2023

brandon-b-miller commented May 11, 2023

View reviewed changes

brandon-b-miller added 3 commits May 11, 2023 09:30

add a debug statement for now

19dd82c

only raise in cec mode

d360008

try bumping to numba 0.57

9c76c61

github-actions bot added the conda label May 11, 2023

brandon-b-miller mentioned this pull request May 11, 2023

[FEA] CUDA 12 pip wheels should throw a helpful warning or error in cases where JIT MVC will not work #13339

Closed

conditionally import ptxcompiler

950f98f

brandon-b-miller marked this pull request as ready for review May 15, 2023 12:40

brandon-b-miller requested review from a team as code owners May 15, 2023 12:40

brandon-b-miller requested review from vyasr and shwina May 15, 2023 12:40

update comments a bit

c8142ea

bdice requested changes May 15, 2023

View reviewed changes

wence- reviewed May 15, 2023

View reviewed changes

python/cudf/cudf/utils/_numba_setup.py Outdated Show resolved Hide resolved

brandon-b-miller mentioned this pull request May 15, 2023

Use _compile_or_get in JIT groupby apply #13350

Merged

brandon-b-miller and others added 2 commits May 15, 2023 09:18

Apply suggestions from code review

8c7bae8

Co-authored-by: Bradley Dice <[email protected]>

Merge branch 'cudf-numba-cuda12-updates' of github.com:brandon-b-mill…

c4edd0e

…er/cudf into cudf-numba-cuda12-updates

brandon-b-miller added 2 commits May 22, 2023 04:05

shuffle imports

93af613

delete explicit runtime check for MVC in cuda 12+ as it's needed more…

2ff5c5d

… globally now

bdice mentioned this pull request May 22, 2023

Require numba >= 0.57 #13395

Closed

bdice requested changes May 22, 2023

View reviewed changes

bdice reviewed May 22, 2023

View reviewed changes

bdice mentioned this pull request May 22, 2023

cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime NVIDIA/cuda-python#16

Closed

brandon-b-miller added 4 commits May 22, 2023 17:06

attempt a simplifying change

5cb0ce6

update ptx/ctk version mapping table

fc69663

merge latest and resolve conflicts

0f1079c

fix local imports

0797cde

remove extraneous testing code

e799992

gmarkall reviewed May 23, 2023

View reviewed changes

python/cudf/cudf/utils/_numba.py Outdated Show resolved Hide resolved

brandon-b-miller and others added 5 commits May 23, 2023 07:17

Apply suggestions from code review

41e92a9

Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Graham Markall <[email protected]>

cleanup

8839f8c

clarify cuda 12 comments

c27a4b1

version map changes

6925612

remove function from ptxcompiler that is not used

439a667

bdice reviewed May 23, 2023

View reviewed changes

ajschmidt8 approved these changes May 23, 2023

View reviewed changes

bdice reviewed May 23, 2023

View reviewed changes

bdice approved these changes May 23, 2023

View reviewed changes

address remaining reviews

1bfb382

gmarkall approved these changes May 23, 2023

View reviewed changes

shwina approved these changes May 23, 2023

View reviewed changes

rapids-bot bot merged commit 12acf92 into rapidsai:branch-23.06 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDF numba cuda 12 updates #13337

cuDF numba cuda 12 updates #13337

brandon-b-miller commented May 11, 2023 •

edited

Loading

brandon-b-miller May 11, 2023

bdice May 15, 2023 •

edited

Loading

brandon-b-miller May 11, 2023

bdice left a comment

bdice May 15, 2023 •

edited

Loading

bdice May 15, 2023

brandon-b-miller May 15, 2023

bdice May 15, 2023

gmarkall May 18, 2023

bdice May 15, 2023

bdice May 15, 2023

brandon-b-miller May 15, 2023

bdice May 22, 2023

bdice May 22, 2023

bdice May 22, 2023 •

edited

Loading

brandon-b-miller May 22, 2023 •

edited

Loading

brandon-b-miller May 22, 2023

bdice May 22, 2023

brandon-b-miller May 22, 2023 •

edited

Loading

jakirkham May 23, 2023

brandon-b-miller May 23, 2023

brandon-b-miller commented May 23, 2023

bdice May 23, 2023

ajschmidt8 left a comment

bdice May 23, 2023

bdice left a comment •

edited

Loading

brandon-b-miller commented May 23, 2023 •

edited

Loading

gmarkall left a comment

brandon-b-miller commented May 23, 2023

shwina commented May 23, 2023

jakirkham commented May 23, 2023


		from numba import config

		ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"

	# the ctk, but when PTX files are present, it might also need to patch
	# the CUDA runtime, but when PTX files are present, it might also need to patch


		from numba import config

		CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"

cuDF numba cuda 12 updates #13337

cuDF numba cuda 12 updates #13337

Conversation

brandon-b-miller commented May 11, 2023 • edited Loading

Choose a reason for hiding this comment

bdice May 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

bdice May 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice May 22, 2023 • edited Loading

Choose a reason for hiding this comment

brandon-b-miller May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brandon-b-miller May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brandon-b-miller commented May 23, 2023

Choose a reason for hiding this comment

ajschmidt8 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment • edited Loading

Choose a reason for hiding this comment

brandon-b-miller commented May 23, 2023 • edited Loading

gmarkall left a comment

Choose a reason for hiding this comment

brandon-b-miller commented May 23, 2023

shwina commented May 23, 2023

jakirkham commented May 23, 2023

brandon-b-miller commented May 11, 2023 •

edited

Loading

bdice May 15, 2023 •

edited

Loading

bdice May 15, 2023 •

edited

Loading

bdice May 22, 2023 •

edited

Loading

brandon-b-miller May 22, 2023 •

edited

Loading

brandon-b-miller May 22, 2023 •

edited

Loading

bdice left a comment •

edited

Loading

brandon-b-miller commented May 23, 2023 •

edited

Loading