-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix SIGHUP on termination (Solaris patch 260-22964338)
This fixes the following bug filed with Solaris: "22964338 ksh93 appears to send SIGHUP to unrelated processes on occasion". It is fixed by applying this patch by Lijo George from the Solaris repo: https://github.com/oracle/solaris-userland/blob/master/components/ksh93/patches/260-22964338.patch The ksh2020 upstream rejected this, but if it's in production use in Solaris, Solaris, it's probably good enough for 93u+m. If any breakage is left, it can be fixed later. att#1 src/cmd/ksh93/include/jobs.h, src/cmd/ksh93/sh/fault.c, src/cmd/ksh93/sh/jobs.c: - Use a new job_hup() function instead of job_kill() to send SIGHUP to job processes on termination. The new function checks if a job is in fact still live before issuing SIGHUP to it.
- Loading branch information
Showing
3 changed files
with
50 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the email on the old mailing list associated with this patch: https://www.mail-archive.com/[email protected]/msg01887.html
I tried using the C program provided in that email but could not replicate the bug on either Linux or Solaris. Is this patch actually necessary?
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Thanks for the find. I will have to try to test this myself as well. I'm not comfortable reverting the patch until we have identified and understood the commit that made it unnecessary.
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested the reproducer for this patch again and couldn't reproduce the
SIGHUP
issue on any version of ksh at all, including ksh93u+. Additionally, the issues with the patch pointed out in att#1 are still valid. If we keep this patch then the now unusedjob_terminate
function and the always trueif(pw->p_pgrp != 0)
check should both be removed.IMO this patch should be reverted unless the
SIGHUP
bug can be successfully reproduced.62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reproduced the bug on a Solaris 11.3 VM with a recent ksh 93u+m with this patch reverted. The steps in the mailing message aren't all that clearly explained. Here's what I did exactly. Note I did all this on the Solaris console, as I disabled the GUI to reduce the load on my host system. I don't know if that makes a difference.
-m64 -Os -std=c99 -D_AST_ksh_release -D__EXTENSIONS__
cat arch/sol.i386/bin/ksh > /bin/ksh93
(this will also catch /bin/sh and others via symlinks and hardlinks; the output redirection method ensures none of those links are broken)gcc -m64 -o cpid cpid.c
ssh localhost
and enter your password.sleep 1 &
and let it finish. Note down the reported PID. That is the one we will reuse. Let's say 26650../cpid 26650
(the PID from the previous step). Now wait until it says "pid 26650 is ready"; it has now succeeded at re-using that PID, and will just sit there. This process will never voluntarily terminate. If we have the bug, the termination of this process will be the symptom.~.
.cpid
has been terminated, reporting:waitpid return 26650, status 0x0001
. This is the bug reproduced. (Note thatstatus 0x0001
refers to being killed by signal 1 which is SIGHUP.)Then I undid the reversion of this patch, redid the whole procedure, and the bug was gone; cpid keeps sitting there in step 9.
I have not tried this on other systems, but it's probably reproducible on others as well.
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of Siteshwar's review comments on att#1 made sense, though. There is quite a bit of no-op code in that function, presumably because it started out as a copy-paste of another function. I'm fixing that now, which is making sense of why this is in fact needed.
One comment is wrong, though: "We can remove the
sig
argument if it's dedicated toSIGHUP
handling". No, we cannot; this function is called byjob_walk()
via a pointer so it must accept that argument even though it doesn't use it.Also, yes, you're right,
job_terminate()
is now unused and should be removed.62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the better reproducer instructions. I finally got the bug to occur in my Solaris VM.
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I've now reproduced it on the stock /bin/ksh 93u+ 2012-08-01 on macOS as well. It should be a system-agnostic bug; this POSIXy signal handling stuff works the same everywhere. (And yes, the patch fixes it on that system as well.)
62cf88d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and documented in 6d3796c