Enable use of `cudaMemcpyAsync` for `thrust::copy` #211

miscco · 2023-07-12T15:34:49Z

In case of contigous ranges of trivially relocatable types we can directly utilize cudaMemcpyAsync instead of going through transform.

Fixes NVIDIA/thrust#1536

jrhemstad · 2023-07-12T15:38:52Z

@senior-zero does this resolve NVIDIA/thrust#1536 ?

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

gevtushenko

This PR is potentially breaking on systems with concurrentManagedAccess=false (windows?) when used with unified memory (details). Ideally, we should check if concurrentManagedAccess=1 before dispatching to memcpy. Since this issue already exists in the case of cross-system copy, I think we can proceed with this PR and prioritize #210 (@jrhemstad @gonzalobg).

Something is going wrong in `thrust::copy`, so I am adding a patch to revert NVIDIA/cccl#211. This will fix the failing tests for rapidsai/cuspatial#1315. This is where it's failing, with `thrust::copy`: https://github.com/rapidsai/cuspatial/blob/8e8fa0689339c0288746b01a8af66fb599f5c770/cpp/tests/range/multipolygon_range_test.cu#L800 CI log: https://github.com/rapidsai/cuspatial/actions/runs/7229405991/job/19700477218?pr=1315#step:7:3280 ``` [ RUN ] MultipolygonRangeOneTest/1.OneMultipolygonRange unknown file: Failure C++ exception with description "__copy:: D->D: failed: cudaErrorInvalidValue: invalid argument" thrown in the test body. [ FAILED ] MultipolygonRangeOneTest/1.OneMultipolygonRange, where TypeParam = double (0 ms) ``` I traced it down to this CCCL PR - note that `thrust::transform(..., thrust::identity{})` works, but the new `thrust::copy` code path in NVIDIA/cccl#211 that calls `cudaMemcpyAsync` doesn't work for some reason. I was not able to identify the exact reason why that PR causes the failure. Patching this out will unblock RAPIDS and let us migrate forward with CCCL 2.2.0. This helps with #502. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Michael Schellenberger Costa (https://github.com/miscco) - Robert Maynard (https://github.com/robertmaynard) URL: #511

miscco added thrust For all items related to Thrust. bug Something isn't working right. labels Jul 12, 2023

miscco requested a review from gevtushenko July 12, 2023 15:34

miscco self-assigned this Jul 12, 2023

Enable use of cudaMemcpyAsync for thrust::copy

c659362

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

miscco force-pushed the utilize_cudaMemcpyAsync branch from 3f8dedd to c659362 Compare July 14, 2023 08:46

gevtushenko approved these changes Jul 14, 2023

View reviewed changes

miscco merged commit ce6a462 into NVIDIA:main Jul 18, 2023
368 checks passed

miscco deleted the utilize_cudaMemcpyAsync branch July 18, 2023 07:29

bdice mentioned this pull request Dec 19, 2023

Add patch reverting CCCL PR 211. rapidsai/rapids-cmake#511

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable use of `cudaMemcpyAsync` for `thrust::copy` #211

Enable use of `cudaMemcpyAsync` for `thrust::copy` #211

miscco commented Jul 12, 2023 •

edited by jrhemstad

Loading

jrhemstad commented Jul 12, 2023

gevtushenko left a comment

Enable use of cudaMemcpyAsync for thrust::copy #211

Enable use of cudaMemcpyAsync for thrust::copy #211

Conversation

miscco commented Jul 12, 2023 • edited by jrhemstad Loading

jrhemstad commented Jul 12, 2023

gevtushenko left a comment

Choose a reason for hiding this comment

Enable use of `cudaMemcpyAsync` for `thrust::copy` #211

Enable use of `cudaMemcpyAsync` for `thrust::copy` #211

miscco commented Jul 12, 2023 •

edited by jrhemstad

Loading