Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable prefetching in cudf.pandas.install() #16439

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jul 30, 2024

Description

This PR enables cudf.pandas managed memory prefetching in cudf.pandas.install(), to ensure that prefetching is enabled for all methods of enabling cudf.pandas.

I also fixed a bug in libcudf's prefetching logic, where it tried to compute the number of characters in a strings column view even if the string column view's data is nullptr. This errors, so we must avoid the chars_size() call and stop the prefetch attempt early.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice added bug Something isn't working non-breaking Non-breaking change cudf.pandas Issues specific to cudf.pandas labels Jul 30, 2024
@bdice bdice self-assigned this Jul 30, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 30, 2024
@bdice bdice changed the base branch from branch-24.10 to branch-24.08 July 30, 2024 20:01
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 30, 2024
Comment on lines +48 to +51
if (data_ptr == nullptr) {
// Do not call chars_size if the data_ptr is nullptr.
return;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug. We can't call chars_size on string column views that are not fully initialized like this one:

// Since we are setting every row to the scalar, the fill() never needs to access
// any of the children in the strings column which would otherwise cause an exception.
column_view sc{value.type(), size, nullptr, nullptr, 0};

Errors from calling chars_size() looked like this:

terminate called after throwing an instance of 'cudf::logic_error'
  what():  CUDF failure at: /home/coder/cudf/cpp/src/strings/strings_column_view.cpp:34: strings column has no children
Aborted (core dumped)

Comment on lines +54 to +67
// Don't try to prefetch nullptrs or empty data. Sometimes libcudf has column
// views that use nullptrs with a nonzero size as an optimization.
if (ptr == nullptr) {
if (prefetch_config::instance().debug) {
std::cerr << "Skipping prefetch of nullptr" << std::endl;
}
return cudaSuccess;
}
if (size == 0) {
if (prefetch_config::instance().debug) {
std::cerr << "Skipping prefetch of size 0" << std::endl;
}
return cudaSuccess;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if this is necessary or not, but it seems like it would improve our library safety to avoid calling the cudaMemPrefetchAsync API unless we know we have a non-null pointer and non-zero size.

@bdice bdice marked this pull request as ready for review July 30, 2024 21:48
@bdice bdice requested review from a team as code owners July 30, 2024 21:48
@bdice bdice requested review from isVoid, charlesbluca, davidwendt and pmattione-nvidia and removed request for a team July 30, 2024 21:48
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving C++ changes.

@vyasr
Copy link
Contributor

vyasr commented Jul 30, 2024

Verified that this works in local testing.

@raydouglass raydouglass merged commit 1f7aae0 into rapidsai:branch-24.08 Jul 31, 2024
90 of 91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants