BackgroundCommand signals don't propagate through some shell setups #645

mrkajetanp · 2023-08-29T09:44:52Z

On some platforms (e.g. Android with su coming from Magisk) there seems to be
an issue where BackgroundCommand.send_signal() does not propagate through the
outer sh layers to the inner process. The same implementation that works okay
whether with adb root or just with su coming from Android breaks on the
Magisk setup. Manually finding the PID of the inner process and sending the
signal to it on the Magisk setup works around the problem which strongly
indicates it's a signal propagation issue.

The process hierarchy ends up looking as follows:

oriole:/ $ ps -e | grep 17488
root         17488 17483 10830432  3176 0                   0 S sh        <-- parent 1
root         17491 17488     876    384 0                   0 S trace-cmd <--  the PID we need to SIGINT
oriole:/ $ ps -e | grep 17483
root         17483 17479 10826332  3176 0                   0 S sh        <-- parent 2
root         17488 17483 10830432  3176 0                   0 S sh
oriole:/ $ ps -e | grep 17479
shell        17479 17477 10852960  3172 sigsuspend          0 S sh        <-- parent 3
root         17483 17479 10826332  3176 0                   0 S sh
oriole:/ $ ps -e | grep 17477
shell        17477   882 10791520  3172 sigsuspend          0 S sh        <-- parent 4, the one held in bg_cmd.pid
shell        17479 17477 10852960  3172 sigsuspend          0 S sh

Other platforms might potentially be affected as well.

The text was updated successfully, but these errors were encountered:

douglas-raillard-arm · 2023-08-29T12:48:26Z

After discussing offline with @mrkajetanp , it seems that a solution might be to change the PID detection code:

We would have a "wrapper" pid and a list of children pids for each background command
The pid mandatory BackgroundCommand attribute would be the item in pids list when the list has only one item. Otherwise, this will trigger an AttributeError with an appropriate message.
For SIGKILL, the signal would be sent to the wrapper and the children pids as well.
Other signals would be sent to all the children pids.

In order to detect what is a child PID and what is a wrapper layer, we can try to manipulate the busybox sh -c process name layer (e.g. by using an intermediate script that simply exec busybox sh -c "$@"'). Since the last layer should always be a devlib-made sh -c layer that should be enough to detect the internal wrapper layers.

douglas-raillard-arm · 2023-08-29T12:52:35Z

Also it's important to note that the signal propagation is not limited to su: shells can have complicated rules, e.g. the propagation of SIGINT with bash: https://www.gnu.org/software/bash/manual/html_node/Signals.html

When Bash is running without job control enabled and receives SIGINT while waiting for a foreground command, it waits until that foreground command terminates and then decides what to do about the SIGINT:

If the command terminates due to the SIGINT, Bash concludes that the user meant to end the entire script, and acts on the SIGINT (e.g., by running a SIGINT trap or exiting itself);

If the pipeline does not terminate due to SIGINT, the program handled the SIGINT itself and did not treat it as a fatal signal. In that case, Bash does not treat SIGINT as a fatal signal, either, instead assuming that the SIGINT was used as part of the program’s normal operation (e.g., emacs uses it to abort editing commands) or deliberately discarded. However, Bash will run any trap set on SIGINT, as it does with any other trapped signal it receives while it is waiting for the foreground command to complete, for compatibility.

So this problem needs to be solved regardless of what su is used.

In order to make su more reliable, we could make busybox setuid and then use busybox su, but that does not seem necessary for now if the PID detection is fixed.

douglas-raillard-arm · 2024-03-07T16:47:34Z

Another avenue would be to insert an extra bit of shell script before the user-provided command:

function __devlib_get_pids() {
   while true; do
       # <insert here a blocking command we can remotely unblock>
       cat /proc/$$/task/*/children > /path/to/a/file/we/know/about/with/unique/UUID
   done
}
__devlib_get_pids &
<insert here the user-provided snippet>

Then devlib can unblock the loop every time it wants an up-to-date list of PIDs. @cloehle raised the issue of the user snippet waiting for all its children (which would wait forever if that includes the function we spawned). This can maybe be worked around by wrapping the user snippet in an sh -c layer. We then know that the PIDs we are looking for are the immediate children of that specific layer.

douglas-raillard-arm · 2024-03-27T14:19:17Z

After some thinking, I feel like the best solution is probably the most explicit one: the user-provided shell snippet should have access to a shell function that allows declaring what is the "PID of that snippet". This would then be collected by the Python part and signals would be delivered to that specific task. Code that wouldn't use that helper would be limited to sending SIGKILL to the command (as it de facto is today):

# The process spawned by devlib-run would have its PID made available to the Python API
target.background('if [ foo ]; then devlib-run mycommand; else foobar; fi')

This allows arbitrarily complex shell snippet to still be handled in the most sensible way by letting the user declare the "main event" in the background command. This is particularly important for signals like SIGUSR1, that the main command might handle in a special way, but may result in terminating any other helper commands that happens to be executed at that time in the snippet if the signal was delivered to all processes.

In terms of implementation, we can easily pre-pend something like that to the user-provided shell command:

function devlib-run {
   # do something based on a UUID generated by Target.background() so that the Python code can retrieve the info somehow, e.g. with a temp file containing the PID name or something like that
}

@marcbonnici what do you think about this solution ?

douglas-raillard-arm · 2024-05-30T11:07:50Z

Started working on it

mrkajetanp changed the title ~~BackgroundCommand signals don't propagate when using Magisk su on Android~~ BackgroundCommand signals don't propagate through some shell setups Sep 11, 2023

douglas-raillard-arm mentioned this issue Mar 27, 2024

ftrace: Support trace-cmd record mode #643

Open

douglas-raillard-arm linked a pull request May 30, 2024 that will close this issue

[Draft] Fix BackgroundCommand.send_signal() and add trace-cmd record support #689

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BackgroundCommand signals don't propagate through some shell setups #645

BackgroundCommand signals don't propagate through some shell setups #645

mrkajetanp commented Aug 29, 2023 •

edited

Loading

douglas-raillard-arm commented Aug 29, 2023

douglas-raillard-arm commented Aug 29, 2023

douglas-raillard-arm commented Mar 7, 2024 •

edited

Loading

douglas-raillard-arm commented Mar 27, 2024

douglas-raillard-arm commented May 30, 2024

BackgroundCommand signals don't propagate through some shell setups #645

BackgroundCommand signals don't propagate through some shell setups #645

Comments

mrkajetanp commented Aug 29, 2023 • edited Loading

douglas-raillard-arm commented Aug 29, 2023

douglas-raillard-arm commented Aug 29, 2023

douglas-raillard-arm commented Mar 7, 2024 • edited Loading

douglas-raillard-arm commented Mar 27, 2024

douglas-raillard-arm commented May 30, 2024

mrkajetanp commented Aug 29, 2023 •

edited

Loading

douglas-raillard-arm commented Mar 7, 2024 •

edited

Loading