-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing openmp issues with FMS2020 cpu affinity #1148
Fixing openmp issues with FMS2020 cpu affinity #1148
Conversation
- FMS2020 has newer cpu affinity work. These are mostly to fix the issues with thread placing and hyperthreadng under slurm on gaea. But it also works on Orion. - The new affinity module simplifies the thread-placing calls in the component models. - The name of some functions has changed, that's the reason for crashes like: FATAL: input domain does not have an io_domain. - This update fixes those issues. - openmp runs with 1 and 2 threads gives the same answers as non-openmp - NOTE: I don't rememer why we put the thread placing calls in MOM_domains.F90 They look as unnecessary and the whole #ifndef NOT_SET_AFFINITY block can probably be removed. ocean_nthreads is either set in coupler or solo_driver.
Codecov Report
@@ Coverage Diff @@
## dev/gfdl #1148 +/- ##
============================================
- Coverage 46.08% 45.78% -0.31%
============================================
Files 214 223 +9
Lines 69399 69835 +436
============================================
- Hits 31984 31972 -12
- Misses 37415 37863 +448
Continue to review full report at Codecov.
|
(Relaying what we discussed with Rusty) It seems that the fms_affinity_* functions require explicit CPU affinities and will fail if more CPUs are available than requested. For now, I think we may be able to resolve this by adding an environment variable to our single-thread OpenMP tests:
I will make this update and then will re-run these tests to see if it resolves the problem. |
Using https://travis-ci.org/github/marshallward/MOM6/jobs/704030161 Not sure if this is really what we want or the way we expect it to be run, but it will keep the tests passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling manually to fix white space issues.
This update fixes the mom6-solo test crashes with openmp with symptoms
FATAL: input domain does not have an io_domain.
With this update openmp runs with 1 and 2 threads give the same answers as non-openmp
for all 3 compilers
FMS2020 has newer cpu affinity work. These are mostly to fix the
issues with thread placing and hyperthreadng under slurm on gaea.
But it also works on Orion.
The new affinity module simplifies the thread-placing calls in the
component models.
NOTE: I don't remember why we put the thread placing calls in MOM_domains.F90
They look unnecessary and the whole #ifndef NOT_SET_AFFINITY block
can probably be removed. ocean_nthreads is either set in coupler or solo_driver.
The only piece I am not sure about is how to set hyperthreading to false for Ocean and true for ATM in coupled runs.