Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only build conda packages once for libraries without direct CUDA dependency #67

Open
3 tasks
jameslamb opened this issue May 29, 2024 · 2 comments
Open
3 tasks

Comments

@jameslamb
Copy link
Member

jameslamb commented May 29, 2024

Description

Created from @bdice's suggestion at rapidsai/cudf#15245 (comment).

Some RAPIDS libraries do not have a direct CUDA dependency, but we're still doing multiple conda builds (one per CUDA major version) for them.

Those projects' conda packages should only be built once, and be free from unnecessarily-declared CUDA dependencies.

See "Notes" for a concrete example.

Benefits of this work

  • reduces CI build times
  • reduces storage footprint on Anaconda
  • simplifies some conda package recipes

Acceptance Criteria

For every RAPIDS library that doesn't have a CUDA dependency, the following should be true for their conda packages:

  • conda version is not part of the conda package build string
  • the number of packages produced should not depend on the number of CUDA major versions RAPIDS supports
    • e.g., custreamz should go from 6 conda packages (2 CUDA versions x 3 Python versions) to 3 (3 Python versions)
  • conda packages are still tested on all CUDA versions RAPIDS supports

Approach

Look through the RAPIDS projects for libraries meeting these criteria:

Add them to a task list here.

For each of those, as described in rapidsai/cudf#15245 (comment), modify them as follows:

Notes

Related to #43, which describes changing the workflows for pure-Python packages to only build against one Python version.

Example: custreamz

For example, let's consider custreamz.

Look at these packages on the rapidsai-nightly channel: https://anaconda.org/rapidsai-nightly/custreamz/files?version=24.08.00a66.

image

Those cuda11_* and cuda12_* packages differ only by their dependency on cuda-version

  • cuda11_*: cuda-version >=11,<12.0a0
  • cuda12_*: cuda-version >=12,<13.0a0

But custreamz shouldn't need a cuda-version dependency... it only contains Python code and only interacts with cudf and cudf_kafka via their Python APIs.

The set of changes to address this issue for custreamz should be, roughly:

What about wheels?

This issue is about conda packages only.

Wheels have a different set of concerns, namely that they use suffixed package names to convey CUDA major version support, and that suffix affects everything in the dependency tree.

For example, dask-cudf depends on cudf, and so we publish wheels with names like dask-cudf-cu11 (depending on cudf-cu12) and dask-cudf-cu12 (depending on dask-cudf-cu12).

Tasks

@bdice
Copy link
Contributor

bdice commented May 29, 2024

There are some things we'd need to improve in how we handle CI artifacts for packages without a direct CUDA dependency, so that they can be tested on CUDA runners with either CUDA 11 or CUDA 12. cuGraph's CI scripts describe this:

https://github.com/rapidsai/cugraph/blob/9503f31add68ab0bda3982fa069da8f756a187a5/ci/build_python.sh#L37-L40

@vyasr
Copy link
Contributor

vyasr commented May 30, 2024

I expect the set of packages to be changed here to be a strict subset of those in #43, with the exception being packages like dask-cudf that have a transitive CUDA dependency via one of their dependencies (because as you noted, wheels need to maintain that dependency so that the dependency tree is CUDA aware even if the current package is not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants