Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable use of cudaMemcpyAsync for thrust::copy #211

Merged
merged 1 commit into from
Jul 18, 2023

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Jul 12, 2023

In case of contigous ranges of trivially relocatable types we can directly utilize cudaMemcpyAsync instead of going through transform.

Fixes NVIDIA/thrust#1536

@miscco miscco added thrust For all items related to Thrust. bug Something isn't working right. labels Jul 12, 2023
@miscco miscco requested a review from gevtushenko July 12, 2023 15:34
@miscco miscco self-assigned this Jul 12, 2023
@jrhemstad
Copy link
Collaborator

@senior-zero does this resolve NVIDIA/thrust#1536 ?

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform.

Fixes NVIDIA#210
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is potentially breaking on systems with concurrentManagedAccess=false (windows?) when used with unified memory (details). Ideally, we should check if concurrentManagedAccess=1 before dispatching to memcpy. Since this issue already exists in the case of cross-system copy, I think we can proceed with this PR and prioritize #210 (@jrhemstad @gonzalobg).

@miscco miscco merged commit ce6a462 into NVIDIA:main Jul 18, 2023
368 checks passed
@miscco miscco deleted the utilize_cudaMemcpyAsync branch July 18, 2023 07:29
rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request Dec 19, 2023
Something is going wrong in `thrust::copy`, so I am adding a patch to revert NVIDIA/cccl#211. This will fix the failing tests for rapidsai/cuspatial#1315.

This is where it's failing, with `thrust::copy`:
https://github.com/rapidsai/cuspatial/blob/8e8fa0689339c0288746b01a8af66fb599f5c770/cpp/tests/range/multipolygon_range_test.cu#L800

CI log: https://github.com/rapidsai/cuspatial/actions/runs/7229405991/job/19700477218?pr=1315#step:7:3280

```
[ RUN      ] MultipolygonRangeOneTest/1.OneMultipolygonRange
unknown file: Failure
C++ exception with description "__copy:: D->D: failed: cudaErrorInvalidValue: invalid argument" thrown in the test body.

[  FAILED  ] MultipolygonRangeOneTest/1.OneMultipolygonRange, where TypeParam = double (0 ms)
```

I traced it down to this CCCL PR - note that `thrust::transform(..., thrust::identity{})` works, but the new `thrust::copy` code path in NVIDIA/cccl#211 that calls `cudaMemcpyAsync` doesn't work for some reason. I was not able to identify the exact reason why that PR causes the failure. Patching this out will unblock RAPIDS and let us migrate forward with CCCL 2.2.0.

This helps with #502.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #511
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. thrust For all items related to Thrust.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Optimize device to device copying
3 participants