Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residual GPU Memory usage #96

Open
r614 opened this issue Aug 19, 2022 · 3 comments
Open

Residual GPU Memory usage #96

r614 opened this issue Aug 19, 2022 · 3 comments

Comments

@r614
Copy link

r614 commented Aug 19, 2022

hi! i am trying to use the scanpy rapids functions to run multiple parallel operations on a server.

the problem i am running into is that after running any scanpy function with rapids enabled, there is some residual memory usage after the function call has ended, and I am assuming this is either because of a memory leak, or because the result itself is stored on the gpu.

during scanpy.tl.neighbors + scanpy.tl.umap call:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   51C    P0    60W /  70W |   7613MiB / 15109MiB |      100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

post function run:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   51C    P0    35W /  70W |   1564MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

we arent running any gpu load besides the umap function, and idle memory usage is ~75Mib.

happy to elaborate more + help find a fix for this. not sure if i am missing something really easy (maybe a cupy.asnumpy somewhere?), so any info would be super helpful!

@r614 r614 changed the title Move data off-GPU after function calls Residual GPU Memory usage Aug 19, 2022
@r614
Copy link
Author

r614 commented Aug 19, 2022

follow up issues after more experimentation, not sure if related:

  • uvm doesn't work when computing pca for a large dataset (64736, 24929) in our case, - crashes with CUSOLVER_STATUS_EXECUTION_FAILED About unified memory in Cupy cupy/cupy#3127
  • when an async loop crashes with a cuda oom, cuda memory isn't actually freed

@cjnolet
Copy link
Member

cjnolet commented Aug 25, 2022

@r614 a CUDA context is created on the GPU before executing any kernels, which will store some metadata and other things like loaded libraries. The CUDA context is usually initialized when calls are made to the CUDA runtime API (such as launching a kernel, for example) and generally lasts for the lifetime of the process. This very small amount of memory (in the range of 10s to 100s of mb) is expected.

IIRC, Scanpy will copy results back to CPU and the GPU memory should eventually cleaned up when the corresponding Python objects are cleaned up. However, it's always possible this might not happen immediately and might require waiting for the garbage collector.

Managed memory is a little exception to the above. You can use it to oversubscribe the GPU memory so you don't immediately get out of memory errors, but that does come at the cost of increased thrashing potential as memory is paged into and out of the GPU as needed. Unfortunately, PCA does require computing the eigenpairs on a covariance matrix, which in your case looks like it would require 24929^2 entries- that's ~2.5GB of 32-bit float values.

I recall at one point there was an additional limit imposed by the eigensolver itself (from cusolver directly), which wouldn't allow the number of columns^2 to be larger than 2^(32-1). This seems like it might be the case here. Can you print the output of conda list? I think this bug might have been fixed recently but I can't recall whether the fix is in CUDA 11.5.

Another benefit to the highly variable gene feature selection we do in our examples is that we avoid these limitations in the PCA altogether.

@r614
Copy link
Author

r614 commented Aug 29, 2022

thanks for the detailed reply!

do you know if there is a workaround for forcing the creation of a new context/garbage collection at the api-level - maybe something akin to torch.cuda.empty_cache(). the garbage collection doesn't seem to trigger even after long periods of inactivity, and wrapping each scanpy/cuda task in its own sub-process will add a lot of complexity.

will post the conda output once I get my environment up again later today. would love to get a stable managed memory setup working - what memory gpu would you recommend for running computations on this size of a dataset? we ran into this on a 16GB gpu, and ran into OOM issues without unified memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants