Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39207: Spawn workers on demand in ProcessPoolExecutor #19453

Merged
merged 7 commits into from
Apr 19, 2020

Conversation

aeros
Copy link
Contributor

@aeros aeros commented Apr 10, 2020

Roughly based on 904e34d, but with a few substantial differences.

/cc @pitrou @brianquinlan

https://bugs.python.org/issue39207

Automerge-Triggered-By: @pitrou

@aeros aeros added the performance Performance or resource usage label Apr 10, 2020
@aeros
Copy link
Contributor Author

aeros commented Apr 10, 2020

I think the MacOS failure can be resolved by setting _idle_workers_semaphore = None in executor.shutdown(), it appears to be an issue with excessive FDs used. I'll see if that addresses the problem.

@aeros aeros force-pushed the bpo39207-ppe-fix-idle-workers branch from dea5df4 to 5e5e4f1 Compare April 10, 2020 01:23
@aeros
Copy link
Contributor Author

aeros commented Apr 10, 2020

Hmm... I could potentially try using a weakref of the executor to access the semaphore (and then deleting it) instead of directly using it in _process_worker; in theory, this should allow it to be GC'd sooner, potentially freeing up the associated file descriptor earlier, and ensuring we don't try to access it after the executor's resources have been finalized (which could be the main issue). For example:

# [executor_reference would be passed as an argument from _adjust_process_count()]
executor = executor_reference()
if executor is not None:
    executor._idle_worker_semaphore.release()
del executor

But, I'd also be willing to try any other possible solutions to address the MacOS failure. See the log for details.

Edit: Never mind, I just recalled that weakref doesn't work for processes since they're not pickle-able objects. So, in order to access the semaphore through the executor weakref, the semaphore release would have to be moved to the executor management thread instead of being within the process; I'll try that next.

@aeros
Copy link
Contributor Author

aeros commented Apr 10, 2020

Moving the semaphore to be accessed and released through a weakref to the executor in the management thread addressed the problem. I'm certain it was an issue with file descriptors, but not 100% sure if it was related to an excessive number of them being used at once or if it was trying to access a file descriptor that had already been removed.

Ultimately, it results in a slight delay between when the worker is finished to when the idle semaphore is released (compared to releasing immediately at the end of _process_worker()), but it will still be a significant overall improvement.

@aeros aeros added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 10, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @aeros for commit 28bb669 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 10, 2020
Comment on lines 997 to 1002
executor = self.executor_type()
executor.submit(mul, 12, 7).result()
executor.submit(mul, 33, 25)
executor.submit(mul, 25, 26).result()
executor.submit(mul, 18, 29)
self.assertEqual(len(executor._processes), 2)
Copy link
Contributor Author

@aeros aeros Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might be subject to race conditions, as indicated in https://buildbot.python.org/all/#/builders/296/builds/46:

======================================================================
FAIL: test_idle_process_reuse_multiple (test.test_concurrent_futures.ProcessPoolSpawnProcessPoolExecutorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/buildarea/pull_request.angelico-debian-amd64/build/Lib/test/test_concurrent_futures.py", line 1002, in test_idle_process_reuse_multiple
    self.assertEqual(len(executor._processes), 2)
AssertionError: 1 != 2

Instead, I think it would probably make more sense to check self.assertTrue(len(executor._processes) <= 2), after explicitly setting max workers to a specific amount, e.g. executor = self.executor_type(4), and then submitting _max_workers jobs. It's perfectly okay if there are less than 2 workers as a result of the jobs being completed quickly, but there should never be more than two if the idle workers are being properly reused (since we directly waited for the result() on two of them).

I'll take a look at this again tomorrow, after the buildbot tests have completed.

@pitrou
Copy link
Member

pitrou commented Apr 10, 2020

@mrocklin, would you have any high-level opinion on this change?

@mrocklin
Copy link

From a performance perspective I guess that the point here is that we're making it faster to create a ProcessPoolExecutor, but making the first few tasks (or the first few times when we have enough concurrent tasks) slower. There might be some cost in that we could have spawned many processes concurrently with greater speed than little by little during execution (maybe).

There are use cases where I would not be surprised if users wanted to create all of the processes at startup time, probably because they want more predictable performance behavior. It might be nice if there was some convenient method to ensure that there were as many worker processes as requested workers. The people who are likely to care about this though are probably sophisticated to map time.sleep or something though.

Anyway, in short I don't have strong opinions either way. Personally, I had already assumed that the ProcessPoolExecutor created processes on demand (probably just because of familiarity with the ThreadPoolExecutor).

@aeros
Copy link
Contributor Author

aeros commented Apr 10, 2020

From a performance perspective I guess that the point here is that we're making it faster to create a ProcessPoolExecutor, but making the first few tasks (or the first few times when we have enough concurrent tasks) slower.

In the issue discussion, there is also the case where a one uses the default number of max workers (os.cpu_count()) on a device with a large number of cores. They may not need to use all of them for their jobs to be completed concurrently, but end up incurring the cost to spawn all of the processes at startup; only for many of them to be idle for the full duration of the executor.

The above would also apply to users with a smaller number of cores, but I think the most significant impact of this change would apply to users with a large number of cores.

It might be nice if there was some convenient method to ensure that there were as many worker processes as requested workers. The people who are likely to care about this though are probably sophisticated to map time.sleep or something though.

It could be possible to consider an implementation of _adjust_process_count() that spawns a worker process for each pending work item. I.E.

while (len(self._processes) < len(self._pending_work_items)
            and len(self._processes) < self._max_workers):
    p = self._mp_context.Process(
               target=_process_worker,
               args=(self._call_queue,
                           self._result_queue,
                           self._initializer,
                           self._initargs))
    p.start()
    self._processes[p.pid] = p

The tricky part though is that the length of the self._pending_work_items dict is an estimate when measured across threads (since it's being modified constantly in the executor management thread), and self._processes of course includes ones that are already in use. So, I'm not certain how helpful it would be in reality.

Personally, I had already assumed that the ProcessPoolExecutor created processes on demand (probably just because of familiarity with the ThreadPoolExecutor).

I'd wager that's the current mentality of most users. Unless they've explored the internals of ProcessPoolExecutor extensively, they'd have no reason to assume it handled worker spawning any differently than ThreadPoolExecutor. This is reinforced by the docs, which suggests that it spawns up to max_workers processes rather than always spawning max_workers processes:

Executor subclass that executes calls asynchronously using a pool of at most max_workers processes.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically good to me, just a couple comments.

Lib/concurrent/futures/process.py Outdated Show resolved Hide resolved
Lib/test/test_concurrent_futures.py Outdated Show resolved Hide resolved
Lib/test/test_concurrent_futures.py Outdated Show resolved Hide resolved
Lib/test/test_concurrent_futures.py Show resolved Hide resolved
aeros and others added 2 commits April 18, 2020 19:50
* Use try-finally for executor.shutdown() in test_saturation
* Use assertLessEqual
* Simplify semaphore acquire

Co-authored-by: Antoine Pitrou <[email protected]>
@aeros
Copy link
Contributor Author

aeros commented Apr 19, 2020

Thanks for the review @pitrou, I believe I've addressed all of the comments. Would you care to give it a final look-over? This would be my first self-authored merge.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aeros . I will merge now.

@miss-islington miss-islington merged commit 1ac6e37 into python:master Apr 19, 2020
@vstinner
Copy link
Member

@pitrou: You could have let @aeros merges the PR, he is a now a core dev 😉

@pitrou
Copy link
Member

pitrou commented Apr 19, 2020

Ah, sorry, I hadn't thought about that.

@aeros
Copy link
Contributor Author

aeros commented Apr 19, 2020

No worries :)

@aeros aeros deleted the bpo39207-ppe-fix-idle-workers branch April 19, 2020 22:48
gpshead added a commit to gpshead/cpython that referenced this pull request Jan 24, 2022
bpo-39207: Spawn workers on demand in ProcessPoolExecutor (pythonGH-19453) was
implemented in a way that introduced child process deadlocks in some
environments due to mixing threading and fork() in the parent process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants