You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whatever is running under the hood in mop run is using OpenMPI in combination with multiprocessing. What this means is that in the conda_concept environments, every multiprocessing process winds up bound to core 0, which causes very poor performance. See the output I just captured from top:
In the base conda environments, everything would be bound to NUMA rank 0, which could be problematic if jobs span multiple NUMA nodes. A quick fix would be to turn binding off entirely with export OMPI_MCA_hwloc_base_binding_policy=none in the job script. Though figuring out what's going on here would be useful, as this breaks some tests in the installation of the analysis3 environments.
The text was updated successfully, but these errors were encountered:
We're stuck with CMOR, can't comment on that. So far the processing is fast enough that it hasn't been an issue, however overall is wasteful. And we've been thinking of ways to move away from "Pool" and possibly opening the files once to process mor variables etc but in reality you're always processing a different combinations of variables so it's easier to use "Pool", anything with similar functionality but more efficient/better settings would be great. We could discuss this at the meeting today.
Whatever is running under the hood in
mop run
is using OpenMPI in combination with multiprocessing. What this means is that in the conda_concept environments, every multiprocessing process winds up bound to core 0, which causes very poor performance. See the output I just captured fromtop
:In the base conda environments, everything would be bound to NUMA rank 0, which could be problematic if jobs span multiple NUMA nodes. A quick fix would be to turn binding off entirely with
export OMPI_MCA_hwloc_base_binding_policy=none
in the job script. Though figuring out what's going on here would be useful, as this breaks some tests in the installation of the analysis3 environments.The text was updated successfully, but these errors were encountered: