-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in threaded application when using sys._current_frames
#106883
Comments
Oh, after several modifications to the interpreter, Anything can happen during GC. In this case, it releases GIL and allows other threads to get deadlocked. Also, GC itself can already trigger deadlocks. "GC being triggered with a lock held" has caused serval problems. See the discussion in #106207 |
We are also seeing this issue at work. I found one reproducible example running Python 3.11.7 (see below), it's much more difficult to reproduce on higher spec machines. On my 8-core cloud VM, I had to run 8 processes at the same time. After a few minutes, a few processes no longer produce output. Although they are still using some CPUs doing something. Since GC is involved here, I haven't been able to reproduce it on Python 3.12.1, likely due to the same 3.12 change to only run GC on the eval breaker as mentioned by @pablogsal here: #106905 (comment) import argparse
import datetime
import difflib
import gc
import random
import sys
import threading
import time
def do_some_work(iterations, sleep_before_work=False):
if sleep_before_work:
# Wait before starting the work, so more concurrent threads could be created
time.sleep(1)
for _ in range(iterations):
seq_a = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
seq_b = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
udiff = list(difflib.unified_diff(seq_a, seq_b))
udiff.append("THE END")
fibonacci(16)
def fibonacci(n):
if n <= 2:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
def gc_callback(phase, info):
del phase, info
do_some_work(1)
gc.callbacks.append(gc_callback)
class Runner:
def __init__(self):
self.watch = True
def watch_frames(self):
while self.watch:
time.sleep(0.1)
cf = sys._current_frames()
print(
f">>>> {datetime.datetime.now()} number of frames: {len(cf)}",
flush=True,
)
def run(self, num_runs, num_threads, work_load):
self.watch = True
watcher = threading.Thread(target=self.watch_frames)
watcher.start()
for i in range(num_runs):
start = time.time()
print(
f">>>> {datetime.datetime.now()} Run: {i}/{num_runs} started",
flush=True,
)
ts = []
for _ in range(num_threads):
t = threading.Thread(target=do_some_work, args=(work_load, True))
t.start()
ts.append(t)
for t in ts:
t.join()
print(
f">>>> {datetime.datetime.now()} Run: {i}/{num_runs} took"
f" {time.time() - start} seconds",
flush=True,
)
self.watch = False
watcher.join()
def main():
print(f">>>> Running on {sys.version_info=}")
print(f">>>> {datetime.datetime.now()} START")
parser = argparse.ArgumentParser()
parser.add_argument("--num_runs", type=int, default=1000)
parser.add_argument("--num_threads", type=int, default=512)
parser.add_argument("--work_load", type=int, default=1)
ns = parser.parse_args()
runner = Runner()
runner.run(ns.num_runs, ns.num_threads, ns.work_load)
print(f">>>> {datetime.datetime.now()} DONE")
if __name__ == "__main__":
main() |
FWIW, backporting #97920 to 3.11 makes the above example no longer reproducible. |
Should we temporarily disable the GC during |
When using threaded applications, there is a high risk of a deadlock in the intepreter. It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL. It has been suggested to disable GC during the _PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls. Jira: ENTLLT-7285 Change-Id: I2548d07803fc98db8717057ae3006e6af68b2f86
When using threaded applications, there is a high risk of a deadlock in the intepreter. It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL. It has been suggested to disable GC during the _PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls. Jira: ENTLLT-7285 Change-Id: I2548d07803fc98db8717057ae3006e6af68b2f86
When using threaded applications, there is a high risk of a deadlock in the intepreter. It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL. It has been suggested to disable GC during the _PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.
@colesbury I've created a PR #117332 trying to fix this issue by implementing what you have suggested. Please let me know if this looks reasonable or if I am completely off-road :) |
When using threaded applications, there is a high risk of a deadlock in the intepreter. It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL. By disabling the GC during the _PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls fixes the issue.
Bug report
When using
sys._current_frames
in a threaded application there is a high risk of a deadlock in the interpreter. Below is my analysis from one such hang. We have never seen this problem with Python 3.8 but experience it regularly with Python 3.11. I know that thesys._current_frames
documentation warns about the function and one solution to this bug report could be to document that the function is not thread safe.We have a huge number of threads in this application but I'll limit the output to the two threads that cause the deadlock.
We have a thread calling
sys._current_frames
with a Python stack trace:and a corresponding C stack trace:
So, this thread does not hold the GIL but seems to hold the
runtime->interpreters.mutex
We have another thread that seems to hold the GIL:
See the
native_thread_id
in the following gdb output.Your environment
Python 3.11.4 on a CentOS 7 machine with conda-forge binaries
Linked PRs
The text was updated successfully, but these errors were encountered: