Automatically activate Python virtual environment on compute nodes #3541

giordano · 2024-07-25T11:18:42Z

Is your feature request related to a problem? Please describe.

When you submit a pipeline with Parsl from a login node, to be executed on a compute node, in the worker_init attribute of the executor provider you likely need to activate the Python virtual environment in which Parsl needs to run. But this is not very user friendly, as the user just had to activate the environment to launch the pipeline in the first place, so this is asking the user to repeat twice the same operation. I was told on Slack there have been some thoughts about automating this in the past, but not real work.

Describe the solution you'd like

Ideally Parsl would automate the Python environment activation. The challenge is that answering the questions "what's the current environment? How do I activate it?" is very complicated in the general case, because it'll also depend on the package manager used.

Describe alternatives you've considered

The alternative is doing nothing, and let the user figure it out, but I find this situation a bit frustrating, the reason why I'd like to use Parsl is to simplify things, not to complicate them.

Additional context

N/A

The text was updated successfully, but these errors were encountered:

giordano · 2024-07-25T11:37:50Z

I'm not sure this would work and I haven't played with it, but I was wondering if replacing

# hope the python environment was activated
CMD() {
process_worker_pool.py ...
}

with something like

# no need to activate the environment
CMD() {
$(python found with sys.executable) $(path to script found with shutil.which("process_worker_pool.py"))...
}

would be a viable option

benclifford · 2024-07-25T11:40:49Z

@giordano i've wondered about that before. in some situations it will work but not all of them - I think it would be easy enough to start using it yourself in your own setups as a hack in your own code, and see how that works.

Alternatively, process_worker_pool.py is executable and contains a hard-coded reference to the relevant Python, so remove the whole python prefix from that and run it as if its just some other unix executable...?

giordano · 2024-07-25T22:21:49Z

I tried

diff --git a/parsl/executors/high_throughput/executor.py b/parsl/executors/high_throughput/executor.py
index 6918336..48bff4b 100644
--- a/parsl/executors/high_throughput/executor.py
+++ b/parsl/executors/high_throughput/executor.py
@@ -1,7 +1,9 @@
 import logging
 import math
 import pickle
+import shutil
 import subprocess
+import sys
 import threading
 import typing
 import warnings
@@ -36,7 +38,8 @@ from parsl.utils import RepresentationMixin
 
 logger = logging.getLogger(__name__)
 
-DEFAULT_LAUNCH_CMD = ("process_worker_pool.py {debug} {max_workers_per_node} "
+DEFAULT_LAUNCH_CMD = (f"{sys.executable} "
+                      f"{shutil.which('process_worker_pool.py')} {{debug}} {{max_workers_per_node}} "
                       "-a {addresses} "
                       "-p {prefetch_capacity} "
                       "-c {cores_per_worker} "

as a quick and dirty hack (I didn't want to change too much code just for testing this solution) and it seems to work for my use case: I can run the pipeline without necessarily activating the environment. Even if this doesn't solve all problems, can something like this be a decent option?

benclifford · 2024-08-02T08:43:08Z

Tagging @rjmello because he's working on something related, in the context of Globus Compute

I think we need to be careful about the principles behind this rather than adding complexity without understanding why changes work.

The patch @giordano has above makes this change:

it forces process worker pool to use the Python that is running the user workflow
it stops process worker pool from using the Python that was hard coded at install time of the package

It still uses $PATH to discover process worker pool.

What are the situations where this improves things?

Why would process worker pool be running with the wrong Python command? Why would sys.executable not be the Python that is already hard-coded at the top of process_worker_pool by pip install?
(for example, is it because the wrong process worker pool has been found on the path? in which case shutil.which('process_worker_pool.py') still runs the wrong process worker pool.

I guess @giordano has an environment in which this improves things so it would be interesting to understand these questions in the context of that environment.

giordano added the enhancement label Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically activate Python virtual environment on compute nodes #3541

Automatically activate Python virtual environment on compute nodes #3541

giordano commented Jul 25, 2024 •

edited

Loading

giordano commented Jul 25, 2024 •

edited

Loading

benclifford commented Jul 25, 2024

giordano commented Jul 25, 2024

benclifford commented Aug 2, 2024

Automatically activate Python virtual environment on compute nodes #3541

Automatically activate Python virtual environment on compute nodes #3541

Comments

giordano commented Jul 25, 2024 • edited Loading

giordano commented Jul 25, 2024 • edited Loading

benclifford commented Jul 25, 2024

giordano commented Jul 25, 2024

benclifford commented Aug 2, 2024

giordano commented Jul 25, 2024 •

edited

Loading

giordano commented Jul 25, 2024 •

edited

Loading