-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Fix the flakey Concurrent tests on macOS #81710
[lldb] Fix the flakey Concurrent tests on macOS #81710
Conversation
The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8 frame swiftlang#1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52 frame swiftlang#2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384 frame swiftlang#3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92 frame swiftlang#4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112 frame swiftlang#5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148 there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062
@llvm/pr-subscribers-lldb Author: Jason Molenda (jasonmolenda) ChangesThe concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in
there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062 Full diff: https://github.com/llvm/llvm-project/pull/81710.diff 1 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/concurrent_base.py b/lldb/packages/Python/lldbsuite/test/concurrent_base.py
index 39eb27fd997471..46d71666d06977 100644
--- a/lldb/packages/Python/lldbsuite/test/concurrent_base.py
+++ b/lldb/packages/Python/lldbsuite/test/concurrent_base.py
@@ -264,12 +264,40 @@ def do_thread_actions(
"Expected main thread (finish) breakpoint to be hit once",
)
- num_threads = self.inferior_process.GetNumThreads()
+ # There should be a single active thread (the main one) which hit
+ # the breakpoint after joining. Depending on the pthread
+ # implementation we may have a worker thread finishing the pthread_join()
+ # after it has returned. Filter the threads to only count those
+ # with user functions on them from our test case file,
+ # lldb/test/API/functionalities/thread/concurrent_events/main.cpp
+ user_code_funcnames = [
+ "breakpoint_func",
+ "crash_func",
+ "do_action_args",
+ "dotest",
+ "main",
+ "register_signal_handler",
+ "signal_func",
+ "sigusr1_handler",
+ "start_threads",
+ "watchpoint_func",
+ ]
+ num_threads_with_usercode = 0
+ for t in self.inferior_process.threads:
+ thread_has_user_code = False
+ for f in t.frames:
+ for funcname in user_code_funcnames:
+ if funcname in f.GetDisplayFunctionName():
+ thread_has_user_code = True
+ break
+ if thread_has_user_code:
+ num_threads_with_usercode += 1
+
self.assertEqual(
1,
- num_threads,
+ num_threads_with_usercode,
"Expecting 1 thread but seeing %d. Details:%s"
- % (num_threads, "\n\t".join(self.describe_threads())),
+ % (num_threads_with_usercode, "\n\t".join(self.describe_threads())),
)
self.runCmd("continue")
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in ``` frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8 frame swiftlang#1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52 frame swiftlang#2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384 frame swiftlang#3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92 frame swiftlang#4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112 frame swiftlang#5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148 ``` there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062 (cherry picked from commit dbc40b3)
The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in
there are none of the functions from the test file present on this thread.
In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them.
It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests.
rdar://110555062