You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This behavior was observed when testing the tst-sigwait.cc with the Linux dynamic linker.
When this portion of the test runs:
voidhandler2(int sig) { siguser2_received = 1;}
intmain(int ac, char** av)
{
...
// Not in set, should receive this one
std::thread thread2([] { kill(0, SIGUSR2); });
while (siguser2_received == 0);
thread2.join();
the underlying glibc implementation of pthread_create() invoked when creating thread2 would block all signals before calling the clone syscall and then unblock all of them using the rt_sigprocmask (see this). Depending on how thread2 and the parent thread get scheduled, the SIGUSR2 may get delivered between blocking and unblocking.
This causes the problem on OSv, because its sigprocmask would add the parent thread to the list of waiters for the signal SIGUSR2. And then when the kill(0, SIGUSR2) executes, it would actually try to wake the parent thread fruitlessly because it does not wait for a signal. To make matters worse, because of this logic in kill()below, the handler2() never gets executed and the test hangs forever waiting for the condition:
intwake_up_signal_waiters(int signo)
{
SCOPE_LOCK(waiters_mutex);
int woken = 0;
unsigned sigidx = signo - 1;
for (auto& t: waiters[sigidx]) {
woken++;
t->remote_thread_local_var<int>(thread_pending_signal) = signo;
t->wake();
}
return woken;
}
intkill(pid_t pid, int sig)
{
...
if ((pid == OSV_PID) || (pid == 0) || (pid == -1)) {
// This semantically means signaling everybody. So we will signal// every thread that is waiting for this.//// The thread does not expect the signal handler to still be delivered,// so if we wake up some folks (usually just the one waiter), we should// not continue processing.if (wake_up_signal_waiters(sig)) {
return0;
}
}
// User-defined signal handler. Run it in a new thread. This isn't// very Unix-like behavior, but if we assume that the program doesn't// care which of its threads handle the signal - why not just create// a completely new thread and run it there...// The newly created thread is tagged as an application one// to make sure that user provided signal handler code has access to all// the features like syscall stack which matters for Golang appsconstauto sa = signal_actions[sigidx];
auto t = sched::thread::make([=] {
if (sa.sa_flags & SA_RESETHAND) {
signal_actions[sigidx].sa_flags = 0;
signal_actions[sigidx].sa_handler = SIG_DFL;
}
BTW this test works fine on Linux.
I think the problem is, that the original implementation of sigwait() (see ee0e618), assumed that thread blocking a signal would eventually call sigwait() to get the signal received but as the test running with glibc pthread implementation illustrates, blocking temporarily SIGUSR2 by __pthread_create_2_1 adds parent thread which then gets woken and the handler does not get executed.
I am not clear on what exactly this comment means:
// The thread does not expect the signal handler to still be delivered,// so if we wake up some folks (usually just the one waiter), we should// not continue processing.if (wake_up_signal_waiters(sig)) {
return0;
}
but maybe we should not stop processing if we know nobody was really woken. Maybe the waiters should be really potential_consumers or something like that. Then sigwait() should add the current thread to the list of real waiters and wake_up_signal_waiters() should be tweaked to count only consumers that were really sigwait()-ing, no?
This behavior was observed when testing the
tst-sigwait.cc
with the Linux dynamic linker.When this portion of the test runs:
the underlying glibc implementation of
pthread_create()
invoked when creatingthread2
would block all signals before calling theclone
syscall and then unblock all of them using thert_sigprocmask
(see this). Depending on howthread2
and the parent thread get scheduled, theSIGUSR2
may get delivered between blocking and unblocking.This causes the problem on OSv, because its
sigprocmask
would add the parent thread to the list of waiters for the signalSIGUSR2
. And then when thekill(0, SIGUSR2)
executes, it would actually try to wake the parent thread fruitlessly because it does not wait for a signal. To make matters worse, because of this logic inkill()
below, thehandler2()
never gets executed and the test hangs forever waiting for the condition:BTW this test works fine on Linux.
I think the problem is, that the original implementation of
sigwait()
(see ee0e618), assumed that thread blocking a signal would eventually callsigwait()
to get the signal received but as the test running with glibc pthread implementation illustrates, blocking temporarilySIGUSR2
by__pthread_create_2_1
adds parent thread which then gets woken and the handler does not get executed.I am not clear on what exactly this comment means:
but maybe we should not stop processing if we know nobody was really woken. Maybe the
waiters
should be reallypotential_consumers
or something like that. Thensigwait()
should add the current thread to the list of real waiters andwake_up_signal_waiters()
should be tweaked to count only consumers that were reallysigwait()
-ing, no?I may be all wrong about it.
Adding relevant OSv strace output fragment:
The text was updated successfully, but these errors were encountered: