(Re)implement threading comms #1156

AngelFP · 2023-11-06T10:19:41Z

As discussed in past meetings, we are currently trying to support running Optimas on Jupyter notebooks to allow for a more interactive workflow. However, when testing this a couple of days ago I realized that this is currently not possible due to the use of multiprocessing for the local communications. Basically, multiprocessing and IPython/Jupyter don't play well together (especially on Windows and Mac, which use spawn) and I couldn't get optimas to run without significant workarounds.

The ideal solution would be that libEnsemble supported a threading local mode. I gave this a try and also found out that there was an early implementation of this in the main branch (was removed in #1123). I updated the QCommThread class to be more similar to QCommProcess and exposed a new local_threading option in libe_specs['comms']. Using this option enables optimas to run on a notebook without any modifications or workarounds by the user.

There are still some issues in the current PR. For example, I had to set log_comm=False in the threading backend to prevent an infinite "loop" of logs that keeps filling up the outbox. I suspect this arises somehow because now the workers share the same memory. The shared memory might also lead to issues with how the resources are currently assigned to workers.

Having this threading backend would be great, and the suggested implementation already solves the issues that we need to (urgently) address in optimas. Do you think it is possible to fully support this mode in libEnsemble?

As a bonus, I think this would enable an alternative solution to optimas-org/optimas#134, where we could define an option such as libe_specs["run_persis_gen_on_thread"]=True to run the generator on a thread (that is, it would be able to share the memory with the manager).

jlnav · 2023-11-06T15:37:56Z

Hey @AngelFP ! We'd be happy to formally support threading comms, and its especially great that the implementation was easy since its restoring code we recently removed :) Thanks!

Either you or we can do this, but much of test_comms.py in libensemble/tests/unit_tests also needs to be restored.

AngelFP · 2023-11-06T15:51:23Z

Sounds great! Note that the QCommThread is not exactly like the old one. I forgot what was the issue, but I had to do some changes to make it work.

If you are fine with using this PR, I can add back the relevant tests.

AngelFP · 2023-11-06T16:02:14Z

Hmm, I had a look at the old tests just now, and they require these classes that are not used anywhere else (and which I didn't have to restore). Would it be better to define new tests for QCommThread only that mirror those of QCommProcess? Like those in tests/test_aaa_comms.py

jlnav · 2023-11-06T16:09:27Z

Hmm, I had a look at the old tests just now, and they require these classes that are not used anywhere else (and which I didn't have to restore). Would it be better to define new tests for QCommThread only that mirror those of QCommProcess? Like those in tests/test_aaa_comms.py

Sounds good to me! (although looking at those tests they aren't as substantial as I remembered, but maybe that's okay since the comms are tested everywhere else)

shuds13 · 2023-11-06T20:02:16Z

We can support, and see how well this works, but I would say use with caution. Threads share an environment, including current working directory. Setting enviornment variables like CUDA_VISIBLE_DEVICES will be shared, unless we move that into a sub-process. These kind of issues are why we did not use the threading interface in the past.

codecov · 2023-11-06T20:08:38Z

Codecov Report

Merging #1156 (5fc9e55) into develop (6c6554d) will decrease coverage by 0.19%.
Report is 4 commits behind head on develop.
The diff coverage is 57.83%.

@@             Coverage Diff             @@
##           develop    #1156      +/-   ##
===========================================
- Coverage    71.51%   71.33%   -0.19%     
===========================================
  Files           88       88              
  Lines         7987     8065      +78     
  Branches      1427     1433       +6     
===========================================
+ Hits          5712     5753      +41     
- Misses        2022     2055      +33     
- Partials       253      257       +4

Files	Coverage Δ
libensemble/specs.py	`97.39% <100.00%> (ø)`
libensemble/libE.py	`68.25% <50.00%> (-0.83%)`	⬇️
libensemble/comms/comms.py	`72.16% <57.33%> (-8.13%)`	⬇️

... and 3 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

AngelFP · 2023-11-06T20:22:06Z

We can support, and see how well this works, but I would say use with caution. Threads share an environment, including current working directory. Setting enviornment variables like CUDA_VISIBLE_DEVICES will be shared, unless we move that into a sub-process. These kind of issues are why we did not use the threading interface in the past.

Makes sense. The idea from the optimas side would be to enable threading only for light function evaluations that do not require the allocation of resources (num_procs or num_gpus) nor the creation of simulation directories. So, basically, only for user-defined functions evaluated with the FunctionEvaluator, not the TemplateEvaluator that we use for running heavy simulations.

AngelFP added 6 commits November 4, 2023 23:47

Implement updated QCommThread

8ca5a3d

Add local_comms to LibeSpecs

6f4aeb6

Enable threading in libE_local

3635052

Disable log_comm in threading

6bb21ea

Change libe_specs

f98975f

Remove duplicate line

0501a7b

AngelFP marked this pull request as draft November 6, 2023 10:22

AngelFP added 2 commits November 6, 2023 21:07

Add comment to terminate

7439645

Add QCommThread to tests

c543efb

AngelFP marked this pull request as ready for review November 6, 2023 20:22

Remove whitespace

5fc9e55

shuds13 approved these changes Nov 6, 2023

View reviewed changes

shuds13 merged commit 9b4e59c into Libensemble:develop Nov 7, 2023
12 of 14 checks passed

shuds13 mentioned this pull request Nov 7, 2023

Release v1.1.0 #1153

Closed

6 tasks

AngelFP mentioned this pull request Nov 8, 2023

Enable local threading communications optimas-org/optimas#140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Re)implement threading comms #1156

(Re)implement threading comms #1156

AngelFP commented Nov 6, 2023 •

edited

Loading

jlnav commented Nov 6, 2023

AngelFP commented Nov 6, 2023 •

edited

Loading

AngelFP commented Nov 6, 2023 •

edited

Loading

jlnav commented Nov 6, 2023

shuds13 commented Nov 6, 2023

codecov bot commented Nov 6, 2023 •

edited

Loading

AngelFP commented Nov 6, 2023

(Re)implement threading comms #1156

(Re)implement threading comms #1156

Conversation

AngelFP commented Nov 6, 2023 • edited Loading

jlnav commented Nov 6, 2023

AngelFP commented Nov 6, 2023 • edited Loading

AngelFP commented Nov 6, 2023 • edited Loading

jlnav commented Nov 6, 2023

shuds13 commented Nov 6, 2023

codecov bot commented Nov 6, 2023 • edited Loading

Codecov Report

AngelFP commented Nov 6, 2023

AngelFP commented Nov 6, 2023 •

edited

Loading

AngelFP commented Nov 6, 2023 •

edited

Loading

AngelFP commented Nov 6, 2023 •

edited

Loading

codecov bot commented Nov 6, 2023 •

edited

Loading