Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review] Add stream support for static_map execution #113

Merged
merged 5 commits into from
Dec 7, 2021

Conversation

chirayuG-nvidia
Copy link
Contributor

Adding plumbing for execution static_map functions on user supplied cuda stream.

include/cuco/static_map.cuh Outdated Show resolved Hide resolved
Comment on lines 35 to 38
slots_ = std::allocator_traits<slot_allocator_type>::allocate(slot_allocator_, capacity_,
exec_stream_);
num_successes_ = std::allocator_traits<counter_allocator_type>::allocate(counter_allocator_, 1,
exec_stream_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
slots_ = std::allocator_traits<slot_allocator_type>::allocate(slot_allocator_, capacity_,
exec_stream_);
num_successes_ = std::allocator_traits<counter_allocator_type>::allocate(counter_allocator_, 1,
exec_stream_);
slots_ = slot_allocator_.allocate(capacity_, exec_stream_);
num_successes_ = counter_allocator_.allocate(1, exec_stream_);

I could be wrong but passing CUDA stream as a hint to std::allocator_traits::allocate (see here) may not be a proper use case.

Copy link
Contributor Author

@chirayuG-nvidia chirayuG-nvidia Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it can given this function taking stream argument. Please let me know what is the correct use here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the stream to allocator_traits is not going to work. You need to call allocate directly like Yunsong showed.

Copy link
Member

@PointKernel PointKernel Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will indeed invoke cuco::allocator::allocate but the stream argument won't be passed properly in this way. It doesn't matter much with cuco::allocator since it takes stream but does not really use it. A real case would be using e.g. rmm allocator allocate. My suggestions will work just fine.

Comment on lines 52 to 53
std::allocator_traits<slot_allocator_type>::deallocate(slot_allocator_, slots_, capacity_);
std::allocator_traits<counter_allocator_type>::deallocate(counter_allocator_, num_successes_, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::allocator_traits<slot_allocator_type>::deallocate(slot_allocator_, slots_, capacity_);
std::allocator_traits<counter_allocator_type>::deallocate(counter_allocator_, num_successes_, 1);
counter_allocator_.deallocate(num_successes_, 1, exec_stream_);
slot_allocator_.deallocate(slots_, capacity_, exec_stream_);

stream));
// stream'd execution assumes sync not required
if (stream == NULL)
CUCO_CUDA_TRY(cudaDeviceSynchronize()); // ensures legacy behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CUCO_CUDA_TRY(cudaDeviceSynchronize()); // ensures legacy behavior
CUCO_CUDA_TRY(cudaStreamSynchronize(stream)); // ensures legacy behavior

We should replace all device sync with stream sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with replacing deviceSync with streamSync, however this change may reveal unintentional data race in applications that unknowingly benefitted from the device level synchronization provided by cudaDeviceSynchronize()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about people inadvertently relying on the device sync here. They should not have been doing that in the first place.

@PointKernel
Copy link
Member

Can you please update the docs also since we are adding a new argument to bulk functions?

@chirayuG-nvidia chirayuG-nvidia changed the title [WIP] Add stream support for static_map execution [Review] Add stream support for static_map execution Nov 8, 2021
@chirayuG-nvidia
Copy link
Contributor Author

Added unit test and addressed review comments.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -42,6 +44,7 @@ int main(void)

// Inserts all pairs into the map
map.insert(pairs.begin(), pairs.end());
cudaStreamSynchronize(str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sync isn't really necessary

include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
include/cuco/detail/static_map.inl Outdated Show resolved Hide resolved
@PointKernel PointKernel added type: feature request New feature request helps: rapids Helps or needed by RAPIDS topic: static_map Issue related to the static_map labels Dec 3, 2021
@PointKernel PointKernel merged commit 76a6ba1 into NVIDIA:dev Dec 7, 2021
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Jan 19, 2022
cuCollection added stream support for static_map in NVIDIA/cuCollections#113

This allows us to avoid unnecessary stream/device synchronizations and also allows to enable multi-stream execution without unnecessary serialization.

This PR updates libcugraph code to pass `handle.get_stream()` to `cuco::static_map`.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Kumar Aatish (https://github.com/kaatish)

URL: #1984
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS topic: static_map Issue related to the static_map type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants