Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding wheel build for libcudf #15483

Merged
merged 122 commits into from
Aug 23, 2024
Merged

Conversation

msarahan
Copy link
Contributor

@msarahan msarahan commented Apr 8, 2024

Description

Contributes to rapidsai/build-planning#33

Adds a standalone libcudf wheel, containing the libcudf C++ shared library.

Fixes #16588

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Notes for Reviewers

Dependency Flows

My (@jameslamb)'s interpretation of the state we want to get to with this PR.

---
title: Build dependencies
---
flowchart TD
    A[libcudf] --> B[pylibcudf]
    A --> C[cudf]
    B --> C[cudf]
    B --> F[cudf-kafka]
    D[dask-cudf]
    E[cudf-polars]
    G[custreamz]
Loading
---
title: Runtime dependencies
---
flowchart TD
    A[libcudf] --> B[pylibcudf]
    B --> C[cudf]
    A --> C
    B --> E[cudf-polars]
    C --> D[dask-cudf]
    C --> F[cudf-kafka]
    C --> G[custreamz]
    F --> G
Loading

Size changes

wheel size (before) size (this PR)
libcudf. --- 440M
pylibcudf 469M 16M
cudf 478M 24M
cudf-polars 0.04M 0.04M
dask-cudf 0.05M 0.05M
TOTAL 947M 480M

NOTES: size = compressed, "before" = 2024-08-21 nightlies (58799d6)

how I calculated those (click me)
docker run \
    --rm \
    -v $(pwd):/opt/work:ro \
    -w /opt/work \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2024-08-21 \
    --env RAPIDS_NIGHTLY_SHA=58799d6 \
    --env RAPIDS_PR_NUMBER=15483 \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env RAPIDS_REPOSITORY=rapidsai/cudf \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \
    bash

mkdir -p "${WHEEL_DIR_BEFORE}"
mkdir -p "${WHEEL_DIR_AFTER}"

cpp_projects=(
    libcudf
)
py_projects=(
    cudf
    pylibcudf
)
py_pure_projects=(
    cudf_polars
    dask_cudf
)

# TODO: calculate the date
for project in "${py_projects[@]}"; do
    # before
    RAPIDS_BUILD_TYPE=nightly \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="branch-24.10" \
    RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}"

    # after
    RAPIDS_BUILD_TYPE=pull-request \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}"
done


for project in "${py_pure_projects[@]}"; do
    # before
    RAPIDS_BUILD_TYPE=nightly \
    RAPIDS_PY_WHEEL_PURE="1" \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="branch-24.10" \
    RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}"
    
    # after
    RAPIDS_BUILD_TYPE=pull-request \
    RAPIDS_PY_WHEEL_PURE="1" \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}"
done

for project in "${cpp_projects[@]}"; do
    # before
    RAPIDS_BUILD_TYPE=nightly \
    RAPIDS_PY_WHEEL_PURE="1" \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="branch-24.10" \
    RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \
        rapids-download-wheels-from-s3 cpp "${WHEEL_DIR_BEFORE}"
    
    # after
    RAPIDS_BUILD_TYPE=pull-request \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \
        rapids-download-wheels-from-s3 cpp "${WHEEL_DIR_AFTER}"
done

du -sh ${WHEEL_DIR_BEFORE}/*
du -sh ${WHEEL_DIR_BEFORE}
du -sh ${WHEEL_DIR_AFTER}/*
du -sh ${WHEEL_DIR_AFTER}

Related work

Corresponding devcontainers PR: rapidsai/devcontainers#271

We should merge that when this is just about ready, then re-run the failing devcontainers CI jobs here.

@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels Apr 8, 2024
@msarahan msarahan added 5 - DO NOT MERGE Hold off on merging; see PR for details and removed Python Affects Python cuDF API. labels Apr 8, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 8, 2024
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
@github-actions github-actions bot added the ci label Apr 9, 2024
@msarahan msarahan force-pushed the libcudf-wheel branch 5 times, most recently from 973ce98 to 1ac95d4 Compare April 11, 2024 20:01
@msarahan
Copy link
Contributor Author

Stuck on dlpack here. The failure is:

/__w/cudf/cudf/python/cudf/build/cp310-cp310-manylinux_2_28_aarch64/cudf/_lib/interop.cxx:1443:10: fatal error: dlpack/dlpack.h: No such file or directory
   1443 | #include "dlpack/dlpack.h"
        |          ^~~~~~~~~~~~~~~~~
  compilation terminated.

https://github.com/rapidsai/cudf/actions/runs/8662725105/job/23755706695?pr=15483#step:9:290

msarahan added a commit to msarahan/devcontainers that referenced this pull request Apr 18, 2024
I'm not at all sure that I've done this correctly. This is intended to support local development with a devcontainer for the work in rapidsai/cudf#15483
@msarahan
Copy link
Contributor Author

@vyasr the build is passing now, but the tests seem to show that the library (libcudf.so) can't be found. It does seem like libcudf-cuXY is being installed OK, so I'm thinking it has to be some kind of RPATH issue. I don't see how you've addressed this in the RMM PR. Any advice?

I've inspected one of the libcudf wheels, and it has both lib and lib64 folders. The lib folder has nvcomp stuff, and lib64 has libcudf.so.

@vyasr
Copy link
Contributor

vyasr commented Apr 19, 2024

You won't see this in the rmm PR because rmm is header-only, so there is no library there. The raft PR is a more useful example here. What you want to look at is actually the Python components of the libraft library, specifically the load.py module. The idea is that we never set RPATHs on libraries explicitly. Instead, we use ctypes to dlopen the library before we import any of the extension modules. This is a more robust solution than setting RPATHS because of the various ways in which it is possible for Python packages to coexist on a user's system (different environments, PYTHONPATH modification, etc).

@msarahan msarahan force-pushed the libcudf-wheel branch 2 times, most recently from 5b6bc46 to 580779d Compare April 23, 2024 17:16
@vyasr
Copy link
Contributor

vyasr commented Apr 29, 2024

You have to set the variables before the rapids-cmake clone happens, which in this case is in the include(../../rapids_config.cmake) line. IOW the variables as currently set won't have any effect. Also, I think you'll need to specify the repo as vyasr/rapids-cmake, not just vyasr.

@bdice bdice mentioned this pull request Aug 22, 2024
3 tasks
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error in the pip devcontainers:

  ERROR: Could not find a version that satisfies the requirement libcudf-cu12==24.10.*,>=0.0.0a0 (from versions: none)
  
  ERROR: No matching distribution found for libcudf-cu12==24.10.*,>=0.0.0a0

Is this blocked by rapidsai/devcontainers#271?

Edit: yes. From the description:

We should merge that when this is just about ready, then re-run the failing devcontainers CI jobs here.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Really only the build.sh change that's a blocker I think.

build.sh Outdated Show resolved Hide resolved
python -m auditwheel repair -w ${package_dir}/final_dist ${package_dir}/dist/*
python -m auditwheel repair \
--exclude libcudf.so \
--exclude libarrow.so.1601 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on merge order this can be removed.

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

mkdir -p ${package_dir}/final_dist
python -m auditwheel repair --exclude libarrow.so.1601 -w ${package_dir}/final_dist ${package_dir}/dist/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this will be removable soon.

ci/release/update-version.sh Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/CMakeLists.txt Show resolved Hide resolved
AyodeAwe pushed a commit to rapidsai/devcontainers that referenced this pull request Aug 23, 2024
I'm not at all sure that I've done this correctly. This is intended to
support local development with a devcontainer for the work in
rapidsai/cudf#15483

Co-authored-by: James Lamb <[email protected]>
@jameslamb
Copy link
Member

devcontainers job passed after merging rapidsai/devcontainers#271,

https://github.com/rapidsai/cudf/actions/runs/10515765176/job/29173481700?pr=15483

😁 thanks @AyodeAwe

Given that + all the approvals here, I'm gonna merge this. Thanks so much for all the help everyone!!!

@jameslamb
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 8d6b261 into rapidsai:branch-24.10 Aug 23, 2024
88 checks passed
rapids-bot bot pushed a commit that referenced this pull request Aug 23, 2024
…lishing (#16650)

Follow-up to #15483.

Contributes to rapidsai/build-planning#33

Wheel publishing for `libcudf` is failing like this:

```text
Error:  File "./dist/*.whl" does not exist
```

([build link](https://github.com/rapidsai/cudf/actions/runs/10528569930/job/29176811683))

Because the `package-type` was not set to `cpp` in the `wheels-publish` CI workflow, and that workflow defaults to `python`. ([shared-workflows code link](https://github.com/rapidsai/shared-workflows/blob/157e9824e6e2181fca9aa5c4bea4defd4cc322b0/.github/workflows/wheels-publish.yaml#L23-L26)).

This fixes that, and makes that choice explicit for all wheel publishing jobs.

References for this `package-type` argument:

* rapidsai/shared-workflows#209
* rapidsai/gha-tools#105

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Bradley Dice (https://github.com/bdice)

URL: #16650
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Aug 29, 2024
Contributes to rapidsai/build-planning#33.

Proposes the following for `cuspatial` wheels:

* add build and runtime dependencies on `libcudf` wheels
* stop vendoring copies of `libcudf.so`, `libnvcomp.so`, `libnvcomp_bitcomp.so`, and `libnvcomp_gdeflate.so`
  - *(load `libcudf.so` dynamically at runtime instead)*

And other related changes for development/CI:

* combine all `pip install` calls into 1 in wheel-testing scripts
  - *like rapidsai/cudf#16575
  - *to improve the chance that packaging issues are discovered in CI*
* `dependencies.yaml` changes:
   - more use of YAML anchors = less duplication
   - use dedicated `depends_on_librmm` and `depends_on_libcudf` groups
* explicitly pass a package type to `gha-tools` wheel uploading/downloading scripts

## Notes for Reviewers

### Benefits of these changes

Unblocks CI in this repo (ref: #1444 (comment), #1441 (comment)).

Reduces wheel sizes for `cuspatial` wheels by about 125MB 😁 

| wheel          | size (before)  | size (this PR) |
|:-----------:|-------------:|---------------:|
| `cuspatial` |   146.0M        |   21M               |
| `cuproj `     |       0.9M       |   0.9M              |
|**TOTAL**   |  **146.9M** | **21.9M**        |

*NOTES: size = compressed, "before" = 2024-08-21 nightlies (c60bd4d), CUDA = 12, Python = 3.11*

<details><summary>how I calculated those (click me)</summary>

```shell
# note: 2024-08-21 because that was the most recent date with
#           successfully-built cuspatial nightlies
#
docker run \
    --rm \
    -v $(pwd):/opt/work:ro \
    -w /opt/work \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2024-08-21 \
    --env RAPIDS_NIGHTLY_SHA=c60bd4d \
    --env RAPIDS_PR_NUMBER=1447 \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env RAPIDS_REPOSITORY=rapidsai/cuspatial \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \
    bash

mkdir -p "${WHEEL_DIR_BEFORE}"
mkdir -p "${WHEEL_DIR_AFTER}"

py_projects=(
    cuspatial
    cuproj
)

for project in "${py_projects[@]}"; do
    # before
    RAPIDS_BUILD_TYPE=nightly \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="branch-24.10" \
    RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}"

    # after
    RAPIDS_BUILD_TYPE=pull-request \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}"
done

du -sh ${WHEEL_DIR_BEFORE}/*
du -sh ${WHEEL_DIR_BEFORE}
du -sh ${WHEEL_DIR_AFTER}/*
du -sh ${WHEEL_DIR_AFTER}
```

</details>

Reduces the amount of additional work required to start shipping `libcuspatial` wheels.

### Background

This is part of ongoing work towards packaging `libcuspatial` as a wheel.

relevant prior work:

* packaging `libcudf` wheels: rapidsai/cudf#15483
* consolidating `pip install` calls in CI scripts for `cudf`: rapidsai/cudf#16575
* `cudf` dropping its Arrow library dependency: rapidsai/cudf#16640

### How I tested this

Confirmed in local builds and CI logs that `cudf` is being *found*, not *built*, in `cuspatial` builds.

```text
-- CPM: Using local package [email protected]
```

([build link](https://github.com/rapidsai/cuspatial/actions/runs/10602971716/job/29386288614?pr=1447#step:9:23472))

Built `cuspatial` wheels locally and ran all the unit tests, without issue.

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #1447
rapids-bot bot pushed a commit that referenced this pull request Aug 31, 2024
Follow-up to #16650 and #15483.

`libcudf` wheels are identical (same content, same filename) across Python versions, but due to an oversight in the PRs linked above, we're currently building nightlies of them once per Python version supported by RAPIDS 😭 

You can see this on recent runs of the `build` workflow:

<img width="752" alt="image" src="https://github.com/user-attachments/assets/ba3a2192-1752-4d32-a79b-6f238fae9f18">

([build link](https://github.com/rapidsai/cudf/actions/runs/10627299703/job/29460218854))

This PR fixes that by applying the same matrix filter to `libcudf` nightly build jobs as is currently applied to PR jobs:

https://github.com/rapidsai/cudf/blob/5e420ff63ba2997a37bf5dfbfaa73c5f05225f9d/.github/workflows/pr.yaml#L195-L200

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Bradley Dice (https://github.com/bdice)

URL: #16714
rapids-bot bot pushed a commit that referenced this pull request Sep 10, 2024
Follow-up to #15483.
Contributes to rapidsai/build-planning#33.

Adds a build-time dependency on `libkvikio` wheels for `libcudf` wheels (per #15483 (comment)).

With this change, CPM is no longer used to download and install the kvikio headers.

Before:

```text
  -- Found cuFile: /usr/local/cuda/lib64/libcufile.so
  -- CPM: Adding package [email protected] (branch-24.10)
```

([recent build link from branch-24.10](https://github.com/rapidsai/cudf/actions/runs/10780576194/job/29896649202#step:9:7673))

After:

```text
  -- KvikIO: Found cuFile Batch API: TRUE
  -- KvikIO: Found cuFile Stream API: TRUE
  -- CPM: Using local package [email protected]
```

([build link from this PR](https://github.com/rapidsai/cudf/actions/runs/10780504202/job/29896555443?pr=16778#step:9:7754))

## Notes for Reviewers

### This removes kvikio headers/CMake files from libcudf wheels

Cuts around 0.8 MB (23 files) out of `libcudf` wheels.

As of this PR, these would no longer be vendored in `libcudf` wheels:

```text
    0  09-08-2024 06:17   libcudf/include/kvikio/
    0  09-08-2024 06:17   libcudf/include/kvikio/shim/
 6356  09-08-2024 06:17   libcudf/include/kvikio/batch.hpp
 3812  09-08-2024 06:17   libcudf/include/kvikio/buffer.hpp
10499  09-08-2024 06:17   libcudf/include/kvikio/utils.hpp
 1399  09-08-2024 06:17   libcudf/include/kvikio/cufile_config.hpp
33385  09-08-2024 06:17   libcudf/include/kvikio/file_handle.hpp
 7299  09-08-2024 06:17   libcudf/include/kvikio/driver.hpp
 9678  09-08-2024 06:17   libcudf/include/kvikio/defaults.hpp
 5352  09-08-2024 06:17   libcudf/include/kvikio/stream.hpp
 6002  09-08-2024 06:17   libcudf/include/kvikio/error.hpp
 4501  09-08-2024 06:17   libcudf/include/kvikio/bounce_buffer.hpp
 3197  09-08-2024 06:17   libcudf/include/kvikio/parallel_operation.hpp
 9864  09-08-2024 06:17   libcudf/include/kvikio/posix_io.hpp
  717  09-08-2024 06:17   libcudf/include/kvikio/version_config.hpp
 4529  09-08-2024 06:17   libcudf/include/kvikio/shim/cuda.hpp
 3331  09-08-2024 06:17   libcudf/include/kvikio/shim/utils.hpp
 4055  09-08-2024 06:17   libcudf/include/kvikio/shim/cufile_h_wrapper.hpp
 2242  09-08-2024 06:17   libcudf/include/kvikio/shim/cuda_h_wrapper.hpp
 7510  09-08-2024 06:17   libcudf/include/kvikio/shim/cufile.hpp
    0  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/
 5031  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/kvikio-targets.cmake
 3681  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/kvikio-config-version.cmake
 6915  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/kvikio-config.cmake
 1529  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/kvikio-dependencies.cmake
 3851  09-08-2024 06:17   libcudf/lib64/cmake/kvikio/FindcuFile.cmake
```

This is safe because kvikio is a PRIVATE dependency of `libcudf`.

https://github.com/rapidsai/cudf/blob/150f1b10ed9c702d5283216b746df685e1708716/cpp/CMakeLists.txt#L796-L802


#

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #16778
rapids-bot bot pushed a commit that referenced this pull request Sep 25, 2024
This CMake option was removed by #15483.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #16879
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team ci CMake CMake build issue conda improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

Update update-version.sh for pylibcudf
7 participants