Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule · 2024-09-27T16:43:47Z

Description

As we expanded the use of the cuda_memcpy functions, we realized that they are not very ergonomic, as they require caller to query is_device_accessible and pass the correct PAGEABLE/PINNED enum based on this.

This PR aims to make the cuda_memcpy functions easier to use, and the call site changes hopefully showcase this. The new implementation takes spans as parameters and relies on the host_span::is_device_accessible to enable copy strategies for pinned memory. Host spans set this flag during construction; creating a host span from a cudf::detail::host_vector will correctly propagate is_device_accessible. Thus, call can simply* call the cuda_memcpy functions with their containers as parameters and rely on implicit conversion to host_span/device_span. Bonus - there's no way to mix up host and device memory pointers 👍

Sharp edges:

Conversion prevents template deduction, so calls that pass containers as parameters need to specify the template parameter (see changes in this PR).
The API copies the min(input.size(), output.size()) bytes, as this is what we can do safely. This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-host-device-copy

bdice · 2024-09-27T19:42:12Z

cpp/include/cudf/detail/utilities/cuda_memcpy.hpp

+  impl::cuda_memcpy_async(
+    dst.data(),
+    src.data(),
+    std::min(dst.size_bytes(), src.size_bytes()),


This might cause surprises to users if they unintentionally pass spans of different sizes. We could instead throw in this case.

Yes, this should be a runtime check and it should throw. If the caller wants to copy subspans, the caller can create subspans. Spans, as view types, are meant to make this easy.

Sure.
Just one thing to keep in mind for this use case:
host_span{my_hv.data(), subsize} is not the same as host_span{my_hv}.subspan(0, subsize) because the first one will not know if it's pointing to pinned memory.

vuule added 4 commits September 26, 2024 16:23

initial impl

a272fe7

Merge branch 'branch-24.12' of https://github.com/rapidsai/cudf into …

7c1c918

…fea-host-device-copy

rework API

c0a2e71

impl fix

db97c3d

vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 27, 2024

vuule self-assigned this Sep 27, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 27, 2024

docs

80047eb

vuule mentioned this pull request Sep 27, 2024

Add APIs to copy data to and from device to cudf::detail::host_vector #16931

Closed

3 tasks

bdice reviewed Sep 27, 2024

View reviewed changes

vuule added 2 commits September 27, 2024 13:06

throw when mismatched sizes

6cf40b3

Merge branch 'branch-24.12' into fea-host-device-copy

7414926

vuule mentioned this pull request Sep 28, 2024

Extend device_scalar to optionally use pinned bounce buffer #16947

Draft

3 tasks

vuule marked this pull request as ready for review September 30, 2024 16:55

vuule requested a review from a team as a code owner September 30, 2024 16:55

vuule requested review from vyasr and pmattione-nvidia September 30, 2024 16:55

pmattione-nvidia approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the `cuda_memcpy` functions to make them more usable #16945

Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule commented Sep 27, 2024 •

edited

Loading

bdice Sep 27, 2024

vuule Sep 27, 2024

vuule Sep 27, 2024

Refactor the cuda_memcpy functions to make them more usable #16945

Are you sure you want to change the base?

Refactor the cuda_memcpy functions to make them more usable #16945

Conversation

vuule commented Sep 27, 2024 • edited Loading

Description

Checklist

bdice Sep 27, 2024

Choose a reason for hiding this comment

vuule Sep 27, 2024

Choose a reason for hiding this comment

vuule Sep 27, 2024

Choose a reason for hiding this comment

Refactor the `cuda_memcpy` functions to make them more usable #16945

Refactor the `cuda_memcpy` functions to make them more usable #16945

vuule commented Sep 27, 2024 •

edited

Loading