Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Statistics Resource Adaptor and cython bindings to tracking_resource_adaptor and statistics_resource_adaptor #626

Conversation

mdemoret-nv
Copy link
Contributor

@mdemoret-nv mdemoret-nv commented Nov 10, 2020

Closes #622 and Closes #623

This PR updates the C++ tracking_resource_adaptor with stack trace information and also adds a new MR, statistics_resource_adaptor. Summary of all changes:

  • tracking_resource_adaptor changes:
    • Added Cython wrapper rmm.mr.TrackingResourceAdaptor which wraps all available methods
    • Updated tracking_resource_adaptor to correctly log stack trace information with capture_stacks=True
  • Added statistics_resource_adaptor memory resource:
    • This MR will keep track of the current, peak and total allocated bytes and number of allocations
    • Added Cython wrapper rmm.mr.StatisticsResourceAdaptor which wraps all available methods

These two MR can be used separately and together to track memory allocations, check for memory leaks, and identify incorrect deallocations. While both MRs can track the current number of allocations, they have different areas of focus.

The tracking_resource_adaptor is designed more towards identifying and fixing memory leaks, and will log stack trace information for every memory allocation. This MR will have significant performance impacts since it logs a large amount of information for every allocation.

The statistics_resource_adaptor is a lightweight MR that adds simple counters to track the allocated bytes and allocation count. This MR will have significantly less of a performance impact but cannot identify the cause of memory leaks, only that they exist. This MR is also great at tracking peak memory usage and can be helpful in identifying areas that require large amounts of memory or helping developers measure memory usage reductions during optimization.

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@harrism
Copy link
Member

harrism commented Nov 10, 2020

o track usage per test in pytest (the main requirement from cuML) I added a reset_allocation_counts() function which will reset all values back to 0. This has a side effect of making it possible to get negative values for current_bytes and current_count (which needed to be changed to ssize_t)

1. I'm not sure how the rmm team will feel about allowing negative numbers for count/bytes totals. I experimented with a couple other designs (such as a push/pop system) that increased the complexity of the design and made it difficult to track the peak allocation correctly.

2. Technically, negative values were possible before by passing the incorrect `bytes` value to `deallocate()` (which would wrap the `size_t` value)

If you must have a reset method, why not just reset the peak and total, but not the current allocations? This way the current values can never disagree with the map. The ideal usage should be to reset after freeing all outstanding allocations, in which case resetting the peak and total would make all things zero.

@mdemoret-nv
Copy link
Contributor Author

If you must have a reset method, why not just reset the peak and total, but not the current allocations? This way the current values can never disagree with the map.

I had considered this but I don't think it would work. Since the peak value is just peak = max(peak, current), if the current wasn't reset too, the peak won't be reset either since it will just take the current value next allocation.

The ideal usage should be to reset after freeing all outstanding allocations, in which case resetting the peak and total would make all things zero.

I see what you are saying, but in our test suite it's difficult to free all GPU memory between tests due to dataset fixtures and shared data between tests. When using this memory resource we created a pytest plugin that will:

  1. Reset the Tracked MR
  2. Run a single test
  3. Garbage Collect
  4. Get the Tracked MR counts

This allows us to determine the memory usage per test (important for running tests in parallel), and find memory leaks on a per test basis if the current allocation count doesn't return to 0.

With all that in mind, I would prefer to keep the reset functionality in there somewhere to allow our pytest plugin to work correctly. But I'm open to other ideas. There are more complex designs that could certainly work, I have just avoided them to keep the design simple and not impact performance.

Copy link
Contributor

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python / Cython lgtm, thanks!

@harrism
Copy link
Member

harrism commented Nov 10, 2020

FWIW, I think tests should start with a new memory resource each time the fixture is created. gtests do this. Therefore the memory is completely freed up between tests. This enables tests to be completely independent and not interfere with each other.

I don't feel OK with a current usage that can be negative.

include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
@mdemoret-nv
Copy link
Contributor Author

I don't feel OK with a current usage that can be negative.

Got it. I'll see if I can work on another design which works for the cuML team that doesn't allow negative usage numbers.

FWIW, I think tests should start with a new memory resource each time the fixture is created. gtests do this. Therefore the memory is completely freed up between tests. This enables tests to be completely independent and not interfere with each other.

I was hesitant to create a new MR for each test since some of our tests configure the MR directly and I wasn't sure the impact this would have. Additionally, some of the resources held between tests may be resized within a test which would not be tracked by a new MR.

I'll do some experimentation and see what impact creating a new MR for each test has on the results we are looking for.

Thanks for the feedback.

@harrism
Copy link
Member

harrism commented Nov 11, 2020

@mdemoret-nv you may want to look at what we do in libcudf gtests as well.

@harrism harrism added the 3 - Ready for review Ready for review by team label Nov 26, 2020
@mdemoret-nv
Copy link
Contributor Author

@harrism I incorporated your feedback and have updated the design to remove the reset() function which was the main driver for negative current usage values. Initially, I tried your suggestion to create a new memory resource before each test but that proved problematic because many of the tests in cuML will improve performance by caching datasets between tests. This causes Python segfaults since the dataset device_buffer will outlive the device_memory_resource it stores a pointer to. (There are a few other situations this can happen in as well, but there is no need to go into that). No matter what I did to try and change the lifetime of various python objects, our test suite always seemed to crash when creating a new memory resource per test.

Instead, I changed the design to store an internal "stack" of tracked values that you can then push/pop as needed. This doesn't eliminate the possibility of getting negative current allocations (which is technically possible with any device_memory_resource that has an "upstream") but it does significantly reduce the possibility. The methods get_total_allocation_counts() and get_allocated_bytes() should now work as you would expect and return the total values summed over the entire stack.

I've tested this in the cuML pytest suite and it works well and does what we need in order to find memory leaks and track memory usage per test. It would be great to get this re-reviewed. In addition, I have a few outstanding questions I could use your input on:

  1. Are you ok with the names push_allocation_counts() and pop_allocation_counts()? I didn't think push() and pop() were descriptive enough.
  2. I broke the string generation section from log_outstanding_allocations() into a new method get_outstanding_allocations_str() which allows getting the stack traces from python as well. Could use some input on your naming conventions here as well.
  3. I also added C++ name demangling to stack_trace::operator<<() which is why there are some new diffs. This might be out of the scope of this PR. I can break it up into a separate one if needed.
  4. Finally, I'm not sure what the difference between a MemoryResource and a ResourceAdaptor is. Can you clarify if the python wrapper should be called TrackingMemoryResource or TrackingResourceAdaptor?

Let me know if you have any questions. Thanks for your feedback so far.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the whole stack thing, it's overcomplicated. If you want to reset the tracking without resetting the upstream memory resource, then delete the tracking_resource_adaptor, and create a new one with the same upstream as the old one.

Keep the tracking simple. No need for a struct of counts, a stack, or any of that.

include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
Comment on lines 186 to 192
* @brief Returns an allocation_counts struct for this adaptor containing the
* total current, peak, and total number of bytes and allocation count for
* this adaptor regardless of any push/popped allocation_counts. Note: Because
* its possible to change memory resources at any time while maintaining the
* same upstream memory resource, its possible to have a negative allocation
* bytes or count if the number of deallocate() calls is greater than the
* number of allocate().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not brief. :) @brief should be one line. Separate detailed docs from @brief with a single blank line.

include/rmm/mr/device/tracking_resource_adaptor.hpp Outdated Show resolved Hide resolved
Comment on lines 188 to 190
* this adaptor regardless of any push/popped allocation_counts. Note: Because
* its possible to change memory resources at any time while maintaining the
* same upstream memory resource, its possible to have a negative allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the note, or the reason for a stack. You can't change the upstream for a resource_adaptor, you would have to create a new resource adaptor. You can't have more deallocations than allocations without an exception. This seems way overcomplicated.

@harrism
Copy link
Member

harrism commented Dec 3, 2020

I'm very sorry for the delay in reviewing @mdemoret-nv. Holidays and then life (outside work) have gotten in the way for the past week+.

@kkraus14
Copy link
Contributor

kkraus14 commented Dec 3, 2020

Should we push this to 0.18 to avoid changing tracking_resource_adaptor right before release?

auto allocated_bytes = found->second.allocation_size;

if (allocated_bytes != bytes) {
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Don't throw but log an error. Throwing in a descructor (or any noexcept) will call
// Don't throw but log an error. Throwing in a destructor (or any noexcept) will call

@harrism
Copy link
Member

harrism commented May 27, 2021

I still think statistics and leak detection should be in two separate memory resource adaptors. Moving to 21.08

@mdemoret-nv mdemoret-nv changed the base branch from branch-21.06 to branch-21.08 June 2, 2021 04:53
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes make me happy! Nice separation of concerns. Just a couple of comments.

@@ -81,6 +81,7 @@ endif(CUDA_STATIC_RUNTIME)

target_link_libraries(rmm INTERFACE rmm::Thrust)
target_link_libraries(rmm INTERFACE spdlog::spdlog_header_only)
target_link_libraries(rmm INTERFACE dl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Quick google shows that dladdr now lives in libc rather than libdl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With stack traces enabled, this was needed to compile the tests (original comment). Keith and I briefly discussed this here: #626 (comment).

Can you send me the link where you saw that dladdr has moved? All I am seeing from this link is:

Link with -ldl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, the docs I found were not for linux -- Solaris and something called illumos. As I said, it was a quick google.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested that all is well in libcudf, cuML, etc. when this library is linked here? Note that the other target_link_libraries for RMM are all header-only, which is why this one has me worried (RMM is a header-only library).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdemoret reports cuML builds and tests fine against this PR.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@harrism harrism requested a review from jrhemstad June 3, 2021 05:13
@harrism
Copy link
Member

harrism commented Jun 3, 2021

Need to update the title and description before merging.

@harrism harrism removed 3 - Ready for review Ready for review by team 5 - Merge After Dependencies Depends on another PR: do not merge out of order labels Jun 3, 2021
@mdemoret-nv mdemoret-nv changed the title Add additional tracking info and cython bindings to tracking_resource_adaptor Add Statistics Resource Adaptor and cython bindings to tracking_resource_adaptor and statistics_resource_adaptor Jun 7, 2021
@mdemoret-nv
Copy link
Contributor Author

The following is the new description for this PR since it has changed so much during development. Leaving the PR description (aka the first comment) in place to preserve history, for now. New description:

This PR updates the C++ tracking_resource_adaptor with stack trace information and also adds a new MR, statistics_resource_adaptor. Summary of all changes:

  • tracking_resource_adaptor changes:
    • Added Cython wrapper rmm.mr.TrackingResourceAdaptor which wraps all available methods
    • Updated tracking_resource_adaptor to correctly log stack trace information with capture_stacks=True
  • Added statistics_resource_adaptor memory resource:
    • This MR will keep track of the current, peak and total allocated bytes and number of allocations
    • Added Cython wrapper rmm.mr.StatisticsResourceAdaptor which wraps all available methods

These two MR can be used separately and together to track memory allocations, check for memory leaks, and identify incorrect deallocations. While both MRs can track the current number of allocations, they have different areas of focus.

The tracking_resource_adaptor is designed more towards identifying and fixing memory leaks, and will log stack trace information for every memory allocation. This MR will have significant performance impacts since it logs a large amount of information for every allocation.

The statistics_resource_adaptor is a lightweight MR that adds simple counters to track the allocated bytes and allocation count. This MR will have significantly less of a performance impact but cannot identify the cause of memory leaks, only that they exist. This MR is also great at tracking peak memory usage and can be helpful in identifying areas that require large amounts of memory or helping developers measure memory usage reductions during optimization.

@harrism
Copy link
Member

harrism commented Jun 7, 2021

The updated description needs to be in the first (original) comment or our merge scripts won't copy the right description into the changelog.

@harrism
Copy link
Member

harrism commented Jun 8, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2f85111 into rapidsai:branch-21.08 Jun 8, 2021
@quasiben quasiben mentioned this pull request Aug 2, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change Python Related to RMM Python API
Projects
None yet
9 participants