Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When using rmm_cupy_allocator with Cupy, the memory resource is constructed twice. #1506

Closed
harrism opened this issue Mar 22, 2024 · 3 comments · Fixed by #1514
Closed
Assignees
Labels
bug Something isn't working Python Related to RMM Python API

Comments

@harrism
Copy link
Member

harrism commented Mar 22, 2024

Describe the bug
I added print statements to the ctor and dtor in the C++ cuda_memory_resource and then set Cupy to use RMM's allocator. The constructor print statement is called twice:

Steps/Code to reproduce bug
Add print statements to the constructor and destructor (and optionally allocate/deallocate functions) of cuda_memory_resource. Then run this code:

>>> import rmm
>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy as cp
>>> cp.cuda.set_allocator(rmm_cupy_allocator)
>>> array = cp.ndarray(1000)
cuda_memory_resource constructor
do_allocate
cuda_memory_resource constructor
>>> del array
do_deallocate

Notice cuda_memory_resource constructor prints twice, showing that the C++ object is constructed twice.

Expected behavior
cuda_memory_resource constructor should only print once (I think).

Environment details (please complete the following information):

  • Environment location: [Docker / dev container]
  • Method of RMM install: [dev container]
@harrism harrism added bug Something isn't working Python Related to RMM Python API ? - Needs Triage Need team to review and classify labels Mar 22, 2024
@harrism
Copy link
Member Author

harrism commented Mar 22, 2024

I can't see immediately why this would happen. I believe this just goes to DeviceBuffer, which goes to c++ device_buffer, which calls get_current_device_resource() in its ctor. This function calls initial_resource() if set_current_device_resource() was never called. And initial_resource constructs the cuda_memory_resource() the first time it is called (stored in a function local static). That shouldn't be called twice and even if it were the static value would be returned.

@shwina shwina self-assigned this Mar 26, 2024
@shwina shwina removed the ? - Needs Triage Need team to review and classify label Mar 26, 2024
@shwina
Copy link
Contributor

shwina commented Mar 26, 2024

This ultimately comes down to Python creating a cuda_memory_resource (first), and then calling set_per_device_resource which itself constructs a cuda_memory_resource()(second).

I think it's harmless as long as the initial_resource() is a cuda_memory_resource and as long as cuda_memory_resource objects are cheap to create.

wence- added a commit to wence-/rmm that referenced this issue Apr 2, 2024
Previously we were relying on the C++ and Python-level device
resources to agree. But this need not be the case.

To avoid this, first get the current deivce resource and then use it
when allocating the wrapped C++ device_buffer when creating
DeviceBuffers.

- Closes rapidsai#1506
@wence-
Copy link
Contributor

wence- commented Apr 2, 2024

This is a consequence of a confluence of factors:

  1. DeviceBuffer on the Python side doesn't accept an mr parameter and so uses get_current_device_resource to obtain an mr.
  2. get_current_device_resource in Python cannot call get_current_device_resource from C++ (because that hands out raw pointers which can't be lifetime managed)
  3. We don't pass an mr into the allocation of the C++ device_buffer that is wrapped inside DeviceBuffer so that then calls the C++ get_current_device_resource.

This is only harmless because get_current_device_resource (without a set_current_device_resource first) returns a cuda_memory_resource, and all cuda_memory_resources are interchangeable. If it returned a more complicated default resource, then we would be out of luck (since we'd end up allocating a device buffer with resource-A and attempting to free it with resource-B). This is another example of #1492 raising its head.

#1514 is a narrow fix for this issue, but we need to go through all the Cython interfaces and make sure we are explicit about tracking memory resources.

@wence- wence- assigned wence- and unassigned shwina Apr 2, 2024
@harrism harrism moved this from Todo to In Progress in RMM Project Board Apr 3, 2024
wence- added a commit to wence-/rmm that referenced this issue Apr 3, 2024
Previously we were relying on the C++ and Python-level device
resources to agree. But this need not be the case.

To avoid this, first get the current deivce resource and then use it
when allocating the wrapped C++ device_buffer when creating
DeviceBuffers.

- Closes rapidsai#1506
@rapids-bot rapids-bot bot closed this as completed in #1514 Apr 3, 2024
rapids-bot bot pushed a commit that referenced this issue Apr 3, 2024
Previously we were relying on the C++ and Python-level device resources to agree. But this need not be the case.

To avoid this, first get the current device resource and then use it when allocating the wrapped C++ device_buffer when creating DeviceBuffers.

- Closes #1506

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #1514
@github-project-automation github-project-automation bot moved this from In Progress to Done in RMM Project Board Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Related to RMM Python API
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants