Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix GUI MPI available cores #871

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

gtdang
Copy link
Collaborator

@gtdang gtdang commented Aug 23, 2024

The GUI MPI was not working on HPC systems where the number of cores partitioned to an instance is less than the total cores of a node.

This fix uses the number of available cores to the process instead of the total number of cores of the system.

Work done

  1. Use psutil to get number of available cores instead of total system cores for Linux and Windows. Mac will still use total system cores.
  2. Exposed the number of cores selection for the MPI backend. Default is set to 1 and the maximum is set to the number of available cores.
  3. Fixed the issue where the backend sub-options (n cores) was not displaying for the default (joblib) on startup

Screenshot 2024-08-26 at 1 49 28 PM

closes #870

@@ -320,6 +321,9 @@ def __init__(self, theme_color="#802989",
# load default parameters
self.params = self.load_parameters(network_configuration)

# Number of available cores
self.n_cores = self._available_cores()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, I've been told that one shouldn't use all the cores available when doing multiprocessing ... the user should be told how many cores are available and how many of those they want to use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed. I documented the GUI's behavior in issue #870

Using the joblib backend the number of cores is exposed and adjustable. With the MPI backend it is not. MPI uses the total number of cores-1. I was curious if there was a reason it was designed this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could adjust the number of cores in MPIBackend too. Is that not the case? I think n_cores - 1 may not be the best default ... but I'm not an MPI expert. I know it's a bad idea with joblibs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just set it to 1 rather than attempt to guess the total number of cores. During the tutorial, the user can be instructed to use more cores. The less magic there is in the code, the less likely it will fail.

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 26, 2024

Ok. I've exposed the number of cores selection for MPI. The default is set to 1 and the maximum should be the total number of cores available to the process rather than the total number of cores on the system.

Note that core availability applies to Linux and Windows systems, Mac will just show the total system cores. This shouldn't be an issue given that this issue mostly applies to running on HPC nodes. Also the user can now adjust the number of cores for MPI manually.

@gtdang gtdang changed the title [WIP] Fix GUI MPI available cores [MRG] Fix GUI MPI available cores Aug 26, 2024
@gtdang gtdang marked this pull request as ready for review August 26, 2024 17:53
@rythorpe
Copy link
Contributor

Sorry, I didn't notice this until now. I just made a comment on the issue page. I'm not opposed to this change, but also don't think it's absolutely necessary to expose the parallel backend API in the GUI with the sole reason that it adds yet another variable for new users to trip over.

@dylansdaniels
Copy link
Collaborator

dylansdaniels commented Aug 26, 2024

@rythorpe are you saying remove the exposed Backend option from the GUI entirely? We do switch the backend from joblib to MPI via the GUI in the workshops to speed things up for users and make them aware of that functionality, so I think we do need it. Though we can simplify it. I also weighed in on the issue page :)

@jasmainak
Copy link
Collaborator

jasmainak commented Aug 27, 2024

I guess it's fair to say we just use MPIBackend and default to joblib if not available ... so no need to expose n_cores for joblibbackend (expose only for mpibackend) and keep it simple

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 27, 2024

I think users should be able to specify number of system resources the GUI is utilizing for a number of reasons.

  1. They can multitask with other processes running long simulation.
  2. I've noticed that MPI can be slower if number of trials <<< number of processors specified. This is likely due to overhead costs.
    • I tested this on Oscar with a 3 trial simulation specified with 48 cores vs 3 cores. MPI with 48 cores was slower than both joblib and MPI specified with 3 cores.
    • I can't test this again with 48 cores though. My account got suspended for resource overuse for an hour. 😅 But I could test it with a lower amount.

That said, I could only test this because I've exposed the number of cores attribute to the user!

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 27, 2024

I've been testing the workshop instances on Oscar (8 cores). I'm finding that MPI will fail on runs with 6-8 cores specified with the message below:

--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to:     CORE
   Node:        node1717
   #processes:  2
   #cpus:       1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------

My instance is partitioned 8 cores on a 32 core node.

(hnn_env) [gdang2@node1717 ~]$ nproc
8
(hnn_env) [gdang2@node1717 ~]$ python
Python 3.11.0 (main, Jan 10 2024, 14:25:36) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> multiprocessing.cpu_count()
32
>>> import psutil
>>> psutil.cpu_count(logical=False)
32

There must be some sort of hardware restriction that prevents process assignment to certain cores here. In this case we would want to use oversubscribe in the mpi command, however the MPIBackend won't apply that keyword because of the logic below. In my case my n.procs < 32 and so oversubscribe and hyper-threading will never be executed.

# did user try to force running on more cores than available?
oversubscribe = False
if self.n_procs > n_logical_cores:
oversubscribe = True
hyperthreading = False
if _has_mpi4py() and _has_psutil():
import psutil
n_physical_cores = psutil.cpu_count(logical=False)
# detect if we need to use hwthread-cpus with mpiexec
if self.n_procs > n_physical_cores:
hyperthreading = True

@rythorpe
Copy link
Contributor

A few things:

  1. I agree with @jasmainak that we should always default to MPIBackend for the GUI and use JoblibBackend as backup.
  2. @dylansdaniels in the old GUI I always found setting the simulation hyperparameters distracting to the flow of our workshops, so yeah, I guess I'm advocating for limiting backend exposure as much as possible. Users who need more advanced control of backend parameters should be encouraged to use the HNN-core API directly.
  3. @gtdang, I totally see your point that parallelizing with the maximal number of cores is not ideal. Given communication overhead, I've generally found that setting n_procs to >20 isn't beneficial for the default network. The other thing to consider that is in favor of exposing the backend API, is if we ever host a workshop on shared resources (e.g., as we have in the past with Metacell), it may at times be more efficient to have individual users run a smaller number of parallel processes so that the class, as a whole, doesn't get bogged down.

Aside from changing the default to MPIBackend for the GUI as mentioned above, what if we fix the logic within MPIBackend to always allow oversubscription and then provide a resonable default for n_procs in the GUI? If you still think it's necessary to expose the backend API in the GUI, I can get on-board, but I do think we should be mindful about limiting the number non-critical parameters in the GUI so that they're (at least visually) not overwhelming.

@dylansdaniels
Copy link
Collaborator

dylansdaniels commented Aug 27, 2024

In addition to Metacell, we also have the GUI available on other shared resources / HPCs like NSG and Oscar. In future workshops, I believe the plan is to encourage more people to start using the GUI on NSG, and I wonder if we'll run into the issue @gtdang found where MPI fails

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 28, 2024

I don't think the default for the GUI should be a backend that is an optional dependency that you can't install with the package and which requires variable set-up depending on user hardware and operating system.

@rythorpe
Copy link
Contributor

As @jasmainak suggested, the idea would be to default to MPIBackend only if it's available, and then switch to JoblibBackend if it's not.

Is there perhaps another way we can streamline use of the MPIBackend for the GUI? I don't think we've ever run a workshop without explicitly using this backend because simulations take too long without it. With the previous GUI, there were a handful of steps that added to the frustration (at least for me) of just getting a single simulation to run in real-time during workshops, one of which was setting the number of cores over which to parallelize the simulation. In service of the vast majority of GUI use-cases, it'd be nice to streamline this better. As I mentioned previously, adding the number of cores as a simulation parameter in the GUI isn't, in itself, a deal breaker, but I think there's something to be said for a GUI simulation that "just runs" quickly without having to sift through a myriad of parameters that don't directly relate to the scientific discourse of our workshops.

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 28, 2024

OK. I think that check for mpi and changing the backend on the start-up should be easy to implement.

I agree that a change to the MPIBackend class is needed, though I don't think we yet understand all the variability and system behaviors of different hardware set-ups to define default safe parameters and arguments with the mpiexec call. I only have been able to test my local macOS and Oscar's virtual machine instance GUI behavior. As the MPIBackend class is the core API, we ought to also look at how it behaves with different system architectures, operating systems. For example, just in the world of HPC, I'm not sure how it will behave in a virtual machine instance vs a Slurm job.

Given that unknown I feel like it's the easiest for now to expose the GUI argument to allow users to try different values that work for their system environment.

@gtdang
Copy link
Collaborator Author

gtdang commented Aug 28, 2024

Added logic to check of packages for mpi use and setting the default backend. b5a07da

# Displays the default backend options
handle_backend_change(self.widget_backend_selection.value,
self._backend_config_out, self.widget_mpi_cmd,
self.widget_n_jobs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually suggesting something simpler ... hnn-core already defaults to JoblibBackend if MPI is not available, so all you had to do was expose MPIBackend and get rid of the JoblibBackend. Anyhow @dylansdaniels do you want this merged for the GUI tutorial today?

Copy link
Collaborator Author

@gtdang gtdang Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That handler in the gui changes the displayed options based on the backend drop-down. It doesn't switch between backends. In hnn-core MPIBackend only switches to joblib if it's a single process, I don't think it checks for MPI and reverts to joblib if it can't find it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, you're right. It does revert to joblib. Though it runs on joblib with only 1 processor, is there a reason why it only uses 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because specifying multiple processes with the joblib backend, unlike the MPI backend, deosn't make sense for all types of simulations. Since joblib parallelizes over multiple trials, it only works for simulations that include multiple trials. Plus parallel joblib processes are memory intensive, so silently switching from MPI to joblib can produce errors if the user wasn't anticipating needing the extra resources needed for a joblib task with many trials.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rythorpe . I would say the GUI should just use MPIBackend, and if not available, just run everything in a single core. It's probably a misnomer to call it "JoblibBackend" if it's running on a single core. Hence, I wouldn't bother exposing it to user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if anyone who is actually using JoblibBackend in the GUI for the their research? I personally don't know of anyone and am generally under the impression that folks who prefer the GUI would want a general-use parallel backend that works without too much fuss.

Your work @ntolley is unique in that your questions require lots of parameter sweeps, which are best served by using the API directly. Note that limited RAM is only an issue for Joblib, which I'm arguing should essentially default to running a single job within the context of the GUI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure a lot of Darcy's initial work compared over multiple trials, although I'll concede she's using the API now. If you're doing 1-1 comparisons of ERP's for instance it's definitely more robust to compute an average over multiple trials. In tutorials we use 1 trial because it's efficient, but that isn't necessarily how you should fit these signals to data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I totally agree with making MPI with oversubscription the default, but I genuinely think trial-based parallelism is a useful feature even in the GUI setting!

These are my cards on the table 😄, but if I get outvoted I'll concede

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. If GUI users use it, we should support it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally feel that once a feature is added, users find ways to justify keeping it. That's why one should start with minimal features first and then add based on necessity and feedback from user base. Anyway, call is yours @ntolley !

@rythorpe
Copy link
Contributor

OK. I think that check for mpi and changing the backend on the start-up should be easy to implement.

I agree that a change to the MPIBackend class is needed, though I don't think we yet understand all the variability and system behaviors of different hardware set-ups to define default safe parameters and arguments with the mpiexec call. I only have been able to test my local macOS and Oscar's virtual machine instance GUI behavior. As the MPIBackend class is the core API, we ought to also look at how it behaves with different system architectures, operating systems. For example, just in the world of HPC, I'm not sure how it will behave in a virtual machine instance vs a Slurm job.

Given that unknown I feel like it's the easiest for now to expose the GUI argument to allow users to try different values that work for their system environment.

That's fair. I still think it's non-ideal that MPI oversubscription isn't being invoked when it's supposed too, but we can fix this later if you think more investigation is needed.

value='Joblib',
description='Backend:')
self.widget_backend_selection = (
Dropdown(options=[('Joblib', 'Joblib'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't bother exposing this to user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI on HPC
5 participants