-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MOM_hor_visc: Variables moved to stack #1362
MOM_hor_visc: Variables moved to stack #1362
Conversation
New diagnostics to horizontal_viscosity were causing issues with stack memory on some platforms, causing the runtime to more than double. Two of the diagnostics were allocatables and the other two were local variables. By redefining the two allocatables as locals (and presumably moving to stack), the faster performance was restored. While the underlying cause is unclear, this is almost certainly due to stack spill in this function, which happens to have a large number of local arrays - including many 3d arrays used to gather diagnostics - and any new variable is going to have volatile consequences. This should be seen as a short term fix. In the future, we need better tools to detect this problem and better guidance on how to responsibly use stack. Also note that two variables were removed: `max_diss_rate_[qh]`. Neither variable was used in the function.
de08370
to
2ffea27
Compare
@marshallward, it seems like you're saying there were issues with stack memory and that you've resolved it by placing more variables on the stack. This seems backwards to me, so I think that I must be misunderstanding. Based on your description of the code's performance, it smells like the original was slower because of the now removed allocate statements. But you're attributing the performance to stack spills. I'm not following your explanation. Could you clarify this? |
@klindsay28 I agree! There isn't an explanation here, just a vague assertion that we're using too much stack, and that changing how we use it somehow(!) resolved the problem. What I can say is that this function suffers from many odd symptoms which suggest stack spill:
I don't mean to assert that this is reducing the stack spill and fixing the underlying problem. Only that there is a problem - caused by the huge number of local arrays, probably on stack - and that any little tweak can either fix or exacerbate the problem. As for why adding two more variables magically fixes the issue that I saw? I don't know. Could be some odd interaction with how allocatable and locals are managed. This should not be seen as anything more than a short-term fix, and it would be good to resolve the bigger problem of excessive locals. |
Another point which I forgot to make: I was not even testing these blocks of code! None of these diagnostics were enabled. Yet their presence was causing the runtime to more than double. So it was not the actual allocation (or lack of) causing the issue. It was the presence of these particular variables and loops, and resulting object code, which was causing a problem for the rest of the function. To say it another way: the mere possibility of such variables may have forced the compiler to make some awful compromises in order to accommodate the worst case scenario. I'm asserting (without much evidence) that if we had fewer locals, then we'd have more stack and the compiler would not feel so trapped and would not be forced to make awkward choices regarding additional heap and stack variables, and the whole problem would just go away. So perhaps moving the two diagnostics from heap to stack was enough to keep the memory under control. I apologize for not providing this context, but I felt it was far too speculative to put in the commit log. |
Thanks for the explanatory comments @marshallward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am convinced that these changes are correct and improve model performance. In addition to passing the TC testing, this PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/12386.
New diagnostics to horizontal_viscosity were causing issues with stack
memory on some platforms, causing the runtime to more than double.
Two of the diagnostics were allocatables and the other two were local
variables. By redefining the two allocatables as locals (and presumably
moving to stack), the faster performance was restored.
While the underlying cause is unclear, this is almost certainly due to
stack spill in this function, which happens to have a large number of
local arrays - including many 3d arrays used to gather diagnostics - and
any new variable is going to have volatile consequences.
This should be seen as a short term fix. In the future, we need better
tools to detect this problem and better guidance on how to responsibly
use stack.
Also note that two variables were removed:
max_diss_rate_[qh]
.Neither variable was used in the function.