[BUG] When using `rmm_cupy_allocator` with Cupy, the memory resource is constructed twice. #1506

harrism · 2024-03-22T18:42:01Z

Describe the bug
I added print statements to the ctor and dtor in the C++ cuda_memory_resource and then set Cupy to use RMM's allocator. The constructor print statement is called twice:

Steps/Code to reproduce bug
Add print statements to the constructor and destructor (and optionally allocate/deallocate functions) of cuda_memory_resource. Then run this code:

>>> import rmm
>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy as cp
>>> cp.cuda.set_allocator(rmm_cupy_allocator)
>>> array = cp.ndarray(1000)
cuda_memory_resource constructor
do_allocate
cuda_memory_resource constructor
>>> del array
do_deallocate

Notice cuda_memory_resource constructor prints twice, showing that the C++ object is constructed twice.

Expected behavior
cuda_memory_resource constructor should only print once (I think).

Environment details (please complete the following information):

Environment location: [Docker / dev container]
Method of RMM install: [dev container]

The text was updated successfully, but these errors were encountered:

harrism · 2024-03-22T18:53:21Z

I can't see immediately why this would happen. I believe this just goes to DeviceBuffer, which goes to c++ device_buffer, which calls get_current_device_resource() in its ctor. This function calls initial_resource() if set_current_device_resource() was never called. And initial_resource constructs the cuda_memory_resource() the first time it is called (stored in a function local static). That shouldn't be called twice and even if it were the static value would be returned.

shwina · 2024-03-26T21:45:53Z

This ultimately comes down to Python creating a cuda_memory_resource (first), and then calling set_per_device_resource which itself constructs a cuda_memory_resource()(second).

I think it's harmless as long as the initial_resource() is a cuda_memory_resource and as long as cuda_memory_resource objects are cheap to create.

Previously we were relying on the C++ and Python-level device resources to agree. But this need not be the case. To avoid this, first get the current deivce resource and then use it when allocating the wrapped C++ device_buffer when creating DeviceBuffers. - Closes rapidsai#1506

wence- · 2024-04-02T15:59:05Z

This is a consequence of a confluence of factors:

DeviceBuffer on the Python side doesn't accept an mr parameter and so uses get_current_device_resource to obtain an mr.
get_current_device_resource in Python cannot call get_current_device_resource from C++ (because that hands out raw pointers which can't be lifetime managed)
We don't pass an mr into the allocation of the C++ device_buffer that is wrapped inside DeviceBuffer so that then calls the C++ get_current_device_resource.

This is only harmless because get_current_device_resource (without a set_current_device_resource first) returns a cuda_memory_resource, and all cuda_memory_resources are interchangeable. If it returned a more complicated default resource, then we would be out of luck (since we'd end up allocating a device buffer with resource-A and attempting to free it with resource-B). This is another example of #1492 raising its head.

#1514 is a narrow fix for this issue, but we need to go through all the Cython interfaces and make sure we are explicit about tracking memory resources.

Previously we were relying on the C++ and Python-level device resources to agree. But this need not be the case. To avoid this, first get the current deivce resource and then use it when allocating the wrapped C++ device_buffer when creating DeviceBuffers. - Closes rapidsai#1506

Previously we were relying on the C++ and Python-level device resources to agree. But this need not be the case. To avoid this, first get the current device resource and then use it when allocating the wrapped C++ device_buffer when creating DeviceBuffers. - Closes #1506 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mark Harris (https://github.com/harrism) URL: #1514

harrism added bug Something isn't working Python Related to RMM Python API ? - Needs Triage Need team to review and classify labels Mar 22, 2024

harrism added this to RMM Project Board Mar 22, 2024

github-project-automation bot moved this to Todo in RMM Project Board Mar 22, 2024

shwina self-assigned this Mar 26, 2024

shwina removed the ? - Needs Triage Need team to review and classify label Mar 26, 2024

wence- mentioned this issue Apr 2, 2024

Explicitly use the current device resource in DeviceBuffer #1514

Merged

3 tasks

wence- assigned wence- and unassigned shwina Apr 2, 2024

wence- mentioned this issue Apr 2, 2024

[FEA] Ensure that Cython wrappers for allocating objects correctly expose and use memory resources #1515

Open

harrism moved this from Todo to In Progress in RMM Project Board Apr 3, 2024

rapids-bot bot closed this as completed in #1514 Apr 3, 2024

github-project-automation bot moved this from In Progress to Done in RMM Project Board Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] When using `rmm_cupy_allocator` with Cupy, the memory resource is constructed twice. #1506

[BUG] When using `rmm_cupy_allocator` with Cupy, the memory resource is constructed twice. #1506

harrism commented Mar 22, 2024 •

edited

Loading

harrism commented Mar 22, 2024

shwina commented Mar 26, 2024

wence- commented Apr 2, 2024

[BUG] When using rmm_cupy_allocator with Cupy, the memory resource is constructed twice. #1506

[BUG] When using rmm_cupy_allocator with Cupy, the memory resource is constructed twice. #1506

Comments

harrism commented Mar 22, 2024 • edited Loading

harrism commented Mar 22, 2024

shwina commented Mar 26, 2024

wence- commented Apr 2, 2024

[BUG] When using `rmm_cupy_allocator` with Cupy, the memory resource is constructed twice. #1506

[BUG] When using `rmm_cupy_allocator` with Cupy, the memory resource is constructed twice. #1506

harrism commented Mar 22, 2024 •

edited

Loading