-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-33613, test_semaphore_tracker_sigint: fix race condition #7850
Changes from 3 commits
5f1df08
4f9babb
26d81a3
a7d5c59
a735d49
9460da4
fb136d0
f3d0f0b
f5e5cf9
4393021
d501f2f
9e0243e
26e077b
a50912b
bcbb942
0b46f07
66ef7dc
f7d2d4d
4a09fde
c9f96c1
c0a088c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,14 +60,14 @@ def ensure_running(self): | |
fds_to_pass.append(sys.stderr.fileno()) | ||
except Exception: | ||
pass | ||
cmd = 'from multiprocessing.semaphore_tracker import main;main(%d)' | ||
cmd = 'from multiprocessing.semaphore_tracker import main;main({}, {})' | ||
r, w = os.pipe() | ||
try: | ||
fds_to_pass.append(r) | ||
# process will out live us, so no need to wait on pid | ||
exe = spawn.get_executable() | ||
args = [exe] + util._args_from_interpreter_flags() | ||
args += ['-c', cmd % r] | ||
args += ['-c', cmd.format(r, sys.stderr.fileno())] | ||
pid = util.spawnv_passfds(exe, args, fds_to_pass) | ||
except: | ||
os.close(w) | ||
|
@@ -105,7 +105,7 @@ def _send(self, cmd, name): | |
getfd = _semaphore_tracker.getfd | ||
|
||
|
||
def main(fd): | ||
def main(fd, fd_write): | ||
'''Run semaphore tracker.''' | ||
# protect the process from ^C and "killall python" etc | ||
signal.signal(signal.SIGINT, signal.SIG_IGN) | ||
|
@@ -128,6 +128,8 @@ def main(fd): | |
cache.add(name) | ||
elif cmd == b'UNREGISTER': | ||
cache.remove(name) | ||
elif cmd == b'PING': | ||
os.write(fd_write, b"PONG\n") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
raise RuntimeError('unrecognized command %r' % cmd) | ||
except Exception: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
import struct | ||
import operator | ||
import weakref | ||
import warnings | ||
import test.support | ||
import test.support.script_helper | ||
from test import support | ||
|
@@ -4472,17 +4473,34 @@ def check_semaphore_tracker_death(self, signum, should_die): | |
# bpo-31310: if the semaphore tracker process has died, it should | ||
# be restarted implicitly. | ||
from multiprocessing.semaphore_tracker import _semaphore_tracker | ||
_semaphore_tracker.ensure_running() | ||
pid = _semaphore_tracker._pid | ||
if pid: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like the multiprocessing code uses "pid is not None". Technically, I'm not sure that it's possible to have a pid of 0, since pid 0 has a special meaning in many C functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm now confused, you modified "if not pid" in the other file, whereas it seems like os.waitpid() doesn't return None. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I did not understand correctly your suggestion. For some reason, I inferred that with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in 822516a1403d4eaec9ac743861336f0902d8186e |
||
os.kill(pid, signal.SIGKILL) | ||
time.sleep(0.5) # give it time to die | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be possible to wait until the process completes instead of using a sleep? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sadly no AFAIK, because the tracker relies on calling
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can make the child capture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not using os.waitpid() in the test, and modify ensure_running() to handle ChildProcessError? The purpose of the waitpid() call in ensure_running() is to check if the process died. We don't care of the exit status. Adding time.sleep(0.5) means adding a new race condition, while fixing another, no? |
||
old_stderr = sys.stderr | ||
r, w = os.pipe() | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Notice that we cannot use |
||
sys.stderr = open(w, "bw") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line should be before the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in d075473 |
||
with warnings.catch_warnings(record=True) as all_warn: | ||
_semaphore_tracker.ensure_running() | ||
pid = _semaphore_tracker._pid | ||
# Wait until we receive the PONG from the child, indicating that | ||
# the signal handlers have been registered. See bpo-33613 for more | ||
# information. | ||
_semaphore_tracker._send("PING", "") | ||
with open(r, "rb") as pipe: | ||
data = pipe.readline() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we okay with this potentially hanging indefinitely if something goes wrong? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commit d075473 implements a way to fail if reading takes too long. |
||
if b"PONG" not in data: | ||
raise ValueError("Invalid data in stderr!") | ||
finally: | ||
sys.stderr.close() | ||
sys.stderr = old_stderr | ||
|
||
os.kill(pid, signum) | ||
time.sleep(1.0) # give it time to die | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you reduce time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because one second seems a lot to me for the signal to be delivered (checked in the buildbot) and this will make the test suite run a bit faster. I can undo this if we want to be conservative :). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please keep 1 second. While you saw the test passing, I'm sure that reducing the sleep will make the test failure more likely on some very busy buildbots. |
||
ctx = multiprocessing.get_context("spawn") | ||
with contextlib.ExitStack() as stack: | ||
if should_die: | ||
stack.enter_context(self.assertWarnsRegex( | ||
UserWarning, | ||
"semaphore_tracker: process died")) | ||
with warnings.catch_warnings(record=True) as all_warn: | ||
sem = ctx.Semaphore() | ||
sem.acquire() | ||
sem.release() | ||
|
@@ -4492,6 +4510,14 @@ def check_semaphore_tracker_death(self, signum, should_die): | |
del sem | ||
gc.collect() | ||
self.assertIsNone(wr()) | ||
if should_die: | ||
self.assertEqual(len(all_warn), 1) | ||
the_warn = all_warn[0] | ||
issubclass(the_warn.category, UserWarning) | ||
serhiy-storchaka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.assertTrue("semaphore_tracker: process died" | ||
in str(the_warn.message)) | ||
else: | ||
self.assertEqual(len(all_warn), 0) | ||
|
||
def test_semaphore_tracker_sigint(self): | ||
# Catchable signal (ignored by semaphore tracker) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if
sys.stderr
isn't an actual file:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was relaying on the fact that the tracker seems to work under the assumption that
sys.stderr
has a file descriptor associated:https://github.com/python/cpython/blob/master/Lib/multiprocessing/semaphore_tracker.py#L60
Relevant lines:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the
except Exception
should be clear, no? :-)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, my bad. :)
Let me investigate what options do we have. Do you have a preferred approach on how to handle this?