Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re)implement threading comms #1156

Merged
merged 9 commits into from
Nov 7, 2023

Conversation

AngelFP
Copy link
Contributor

@AngelFP AngelFP commented Nov 6, 2023

As discussed in past meetings, we are currently trying to support running Optimas on Jupyter notebooks to allow for a more interactive workflow. However, when testing this a couple of days ago I realized that this is currently not possible due to the use of multiprocessing for the local communications. Basically, multiprocessing and IPython/Jupyter don't play well together (especially on Windows and Mac, which use spawn) and I couldn't get optimas to run without significant workarounds.

The ideal solution would be that libEnsemble supported a threading local mode. I gave this a try and also found out that there was an early implementation of this in the main branch (was removed in #1123). I updated the QCommThread class to be more similar to QCommProcess and exposed a new local_threading option in libe_specs['comms']. Using this option enables optimas to run on a notebook without any modifications or workarounds by the user.

There are still some issues in the current PR. For example, I had to set log_comm=False in the threading backend to prevent an infinite "loop" of logs that keeps filling up the outbox. I suspect this arises somehow because now the workers share the same memory. The shared memory might also lead to issues with how the resources are currently assigned to workers.

Having this threading backend would be great, and the suggested implementation already solves the issues that we need to (urgently) address in optimas. Do you think it is possible to fully support this mode in libEnsemble?

As a bonus, I think this would enable an alternative solution to optimas-org/optimas#134, where we could define an option such as libe_specs["run_persis_gen_on_thread"]=True to run the generator on a thread (that is, it would be able to share the memory with the manager).

@AngelFP AngelFP marked this pull request as draft November 6, 2023 10:22
@jlnav
Copy link
Member

jlnav commented Nov 6, 2023

Hey @AngelFP ! We'd be happy to formally support threading comms, and its especially great that the implementation was easy since its restoring code we recently removed :) Thanks!

Either you or we can do this, but much of test_comms.py in libensemble/tests/unit_tests also needs to be restored.

@AngelFP
Copy link
Contributor Author

AngelFP commented Nov 6, 2023

Sounds great! Note that the QCommThread is not exactly like the old one. I forgot what was the issue, but I had to do some changes to make it work.

If you are fine with using this PR, I can add back the relevant tests.

@AngelFP
Copy link
Contributor Author

AngelFP commented Nov 6, 2023

Hmm, I had a look at the old tests just now, and they require these classes that are not used anywhere else (and which I didn't have to restore). Would it be better to define new tests for QCommThread only that mirror those of QCommProcess? Like those in tests/test_aaa_comms.py

@jlnav
Copy link
Member

jlnav commented Nov 6, 2023

Hmm, I had a look at the old tests just now, and they require these classes that are not used anywhere else (and which I didn't have to restore). Would it be better to define new tests for QCommThread only that mirror those of QCommProcess? Like those in tests/test_aaa_comms.py

Sounds good to me! (although looking at those tests they aren't as substantial as I remembered, but maybe that's okay since the comms are tested everywhere else)

@shuds13
Copy link
Member

shuds13 commented Nov 6, 2023

We can support, and see how well this works, but I would say use with caution. Threads share an environment, including current working directory. Setting enviornment variables like CUDA_VISIBLE_DEVICES will be shared, unless we move that into a sub-process. These kind of issues are why we did not use the threading interface in the past.

Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Merging #1156 (5fc9e55) into develop (6c6554d) will decrease coverage by 0.19%.
Report is 4 commits behind head on develop.
The diff coverage is 57.83%.

@@             Coverage Diff             @@
##           develop    #1156      +/-   ##
===========================================
- Coverage    71.51%   71.33%   -0.19%     
===========================================
  Files           88       88              
  Lines         7987     8065      +78     
  Branches      1427     1433       +6     
===========================================
+ Hits          5712     5753      +41     
- Misses        2022     2055      +33     
- Partials       253      257       +4     
Files Coverage Δ
libensemble/specs.py 97.39% <100.00%> (ø)
libensemble/libE.py 68.25% <50.00%> (-0.83%) ⬇️
libensemble/comms/comms.py 72.16% <57.33%> (-8.13%) ⬇️

... and 3 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@AngelFP
Copy link
Contributor Author

AngelFP commented Nov 6, 2023

We can support, and see how well this works, but I would say use with caution. Threads share an environment, including current working directory. Setting enviornment variables like CUDA_VISIBLE_DEVICES will be shared, unless we move that into a sub-process. These kind of issues are why we did not use the threading interface in the past.

Makes sense. The idea from the optimas side would be to enable threading only for light function evaluations that do not require the allocation of resources (num_procs or num_gpus) nor the creation of simulation directories. So, basically, only for user-defined functions evaluated with the FunctionEvaluator, not the TemplateEvaluator that we use for running heavy simulations.

@AngelFP AngelFP marked this pull request as ready for review November 6, 2023 20:22
@shuds13 shuds13 merged commit 9b4e59c into Libensemble:develop Nov 7, 2023
12 of 14 checks passed
@shuds13 shuds13 mentioned this pull request Nov 7, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants