-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce redundant GPU allocations #2393
Reduce redundant GPU allocations #2393
Conversation
This is nice! The previously high memory consumption prevented the execution of my large network application with multi-compartment neurons (see here for the code of the single-compartment case, the code for the multi-compartment case is not public yet) on common GPUs. The present PR reduces the GPU memory consumption for that application from >> 8 GB to ~100 MB. Also, the whole variety of tests for my network application do pass (for both the single-compartment and the multi-compartment case). |
You can also try #2394, which takes the concept one step further. |
Spack is failing due to pb11-stubgen not being updated. I made the corresponding PR here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the conditional initialization and allocation of the different ion state arrays is not very easy to follow. There must be a better way to do this, but that's maybe for another PR.
Co-authored-by: boeschf <[email protected]>
can we add a spack variant that would disable the pybind11 stubgen? This would be consistent with the cmake and would allow the spack CI to still pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, looks good to me
Introduction
Reasoning: If concentrations are never changed, we do not reset them and thus do not need
to store the values.
solution
from GPU solverThis saves per CV
1 x 8B
forcv_area
unconditionally1 x 8B
forXd
for each ion with no diffusion is in use (majority of cases)2 x 8B
forXi
for each ion (reset
andinit
) if not written (reasonably often)2 x 8B
forXo
for each ion (reset
andinit
) if not written (majority of cases)1 x 8B
foreX
reset for each ion if not read (majority)1 x 8B
foreX
for each ion if not read (rarely)In my standard benchmark,
busyring
with complex cells, this saves about 18% of the total GPUallocation for the cell data (
shared_state
).This has become a mixed bag, fixing a few additional things that came up during testing this: