Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI binding issue #108

Open
dsroberts opened this issue Mar 15, 2024 · 3 comments
Open

MPI binding issue #108

dsroberts opened this issue Mar 15, 2024 · 3 comments

Comments

@dsroberts
Copy link

Whatever is running under the hood in mop run is using OpenMPI in combination with multiprocessing. What this means is that in the conda_concept environments, every multiprocessing process winds up bound to core 0, which causes very poor performance. See the output I just captured from top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                  P
3784739 dr4292    20   0 9551920 855768 141916 R   8.0   0.2   0:05.82 python3                  0
3784743 dr4292    20   0 9284928 741448  93716 R   7.7   0.1   0:03.09 python3                  0
3784744 dr4292    20   0 9553016 856444 142424 R   7.7   0.2   0:06.88 python3                  0
3784745 dr4292    20   0 9552076 856052 141976 R   7.7   0.2   0:06.88 python3                  0
3784748 dr4292    20   0 9553360 857132 142744 R   7.7   0.2   0:09.41 python3                  0
3784749 dr4292    20   0 9551852 856148 142380 R   7.7   0.2   0:05.71 python3                  0
3784750 dr4292    20   0 9553244 855944 141872 R   7.7   0.2   0:07.82 python3                  0
3784752 dr4292    20   0 9553048 855736 141852 R   7.7   0.2   0:06.76 python3                  0
3784740 dr4292    20   0 9551916 856272 142624 R   7.3   0.2   0:05.52 python3                  0
3784741 dr4292    20   0 9553240 857804 143576 R   7.3   0.2   0:07.39 python3                  0
3784746 dr4292    20   0 9553516 857048 142348 R   7.3   0.2   0:10.11 python3                  0
3784751 dr4292    20   0 9551916 855568 141984 R   7.3   0.2   0:05.44 python3                  0
3784742 dr4292    20   0 9553612 855976 141260 S   3.7   0.2   0:11.10 python3                  0
3784747 dr4292    20   0 9552520 856800 142136 D   3.3   0.2   0:10.42 python3                  0

In the base conda environments, everything would be bound to NUMA rank 0, which could be problematic if jobs span multiple NUMA nodes. A quick fix would be to turn binding off entirely with export OMPI_MCA_hwloc_base_binding_policy=none in the job script. Though figuring out what's going on here would be useful, as this breaks some tests in the installation of the analysis3 environments.

@dsroberts
Copy link
Author

Oh, it gets worse. Each of the processes is launching ncpus threads too.

[dr4292@gadi-cpu-spr-0474 ~]$ ps -eLF | grep -c python
2934

@paolap
Copy link
Collaborator

paolap commented Mar 17, 2024

We're stuck with CMOR, can't comment on that. So far the processing is fast enough that it hasn't been an issue, however overall is wasteful. And we've been thinking of ways to move away from "Pool" and possibly opening the files once to process mor variables etc but in reality you're always processing a different combinations of variables so it's easier to use "Pool", anything with similar functionality but more efficient/better settings would be great. We could discuss this at the meeting today.

@paolap
Copy link
Collaborator

paolap commented Mar 18, 2024

Updating after conversation, possible culprits are CMOR itself and/or using dask (which is used mostly in the background we aren't using a client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants