Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging/Monitoring PaSh spawned processes #500

Open
angelhof opened this issue Mar 14, 2022 · 2 comments
Open

Logging/Monitoring PaSh spawned processes #500

angelhof opened this issue Mar 14, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@angelhof
Copy link
Member

Currently, it is pretty hard to have any visibility on the dataflow processes that are spawned by PaSh. It would be great if we could have some form of observability of their status/progress.

One way to do it would be to log events related to each of these processes (running, progress (based on INPUT/OUTPUT processed), exit status, etc). Then we could develop a separate extension to analyze and show these logs.

For more info see discussion in #491

@angelhof angelhof added the enhancement New feature or request label Mar 14, 2022
@asharma0703
Copy link

Do you have any ideas for how these events could be logged? Not super familiar with prior art here but from some cursory searches I think adding an strace option in debug mode might make sense.

For example if you prepend strace -Y -f -e execve to the args for execution of the compiled script in compiler/pash.py, and run the input from #491, you get some output like this (in between all the spawns/exits during setup/teardown):

[pid 124519<bash>] +++ exited with 0 +++
[pid 124507<auto-split.sh>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=124519<bash>, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 124522 attached
[pid 124522<bash>] execve("/home/arnav/pash/runtime/split", ["/home/arnav/pash/runtime/split", "/tmp/pash_1Zz3NvgG6J", "107860", "/tmp/pash_EzzpX5c/1c722115196647"..., "/tmp/pash_EzzpX5c/1c722115196647"...], 0x56012cd5f380 /* 93 vars */) = 0
[pid 124509<bash>] execve("/usr/bin/tr", ["tr", "-s", " ", ""], 0x55ad1e5942a0 /* 93 vars */ <unfinished ...>
[pid 124508<bash>] execve("/usr/bin/tr", ["tr", "-s", " ", ""], 0x55ad1e5941c0 /* 93 vars */ <unfinished ...>
[pid 124509<tr>] <... execve resumed>)  = 0
[pid 124508<tr>] <... execve resumed>)  = 0
tr: tr: when not truncating set1, string2 must be non-emptywhen not truncating set1, string2 must be non-empty

[pid 124509<tr>] +++ exited with 1 +++
[pid 124508<tr>] +++ exited with 1 +++
[pid 124517<eager>] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=124517<eager>, si_uid=1000} ---
[pid 124517<eager>] +++ killed by SIGPIPE +++
[pid 124522<split>] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=124522<split>, si_uid=1000} ---
[pid 124518<eager>] +++ exited with 0 +++
[pid 124522<split>] +++ killed by SIGPIPE +++
[pid 124507<auto-split.sh>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=124522, si_uid=1000, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
[pid 124511<eager.sh>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=124518, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
[pid 124510<eager.sh>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=124517, si_uid=1000, si_status=SIGPIPE, si_utime=0, si_stime=0} ---
strace: Process 124523 attached
[pid 124515<eager>] +++ exited with 0 +++
[pid 124513<cat>] +++ exited with 0 +++
[pid 124512<eager.sh>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=124515, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
[pid 124473<env>] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=124506, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
[pid 124523<bash>] execve("/usr/bin/rm", ["rm", "-f", "/tmp/pash_1Zz3NvgG6J"], 0x56012cd583c0 /* 93 vars */strace: Process 124524 attached
) = 0

It's verbose / not the easiest to parse, and probably something to include only in debug mode due to strace performance impact. It also doesn't really address the issue of progress on input data. Might help debugging issues with exit codes like the above though.

@angelhof
Copy link
Member Author

Thanks a lot for this suggestion! That would be a great way to debug these issues, and we could only enable it when doing heavyweight fine-grained debugging.

Do you have any ideas for how these events could be logged? Not super familiar with prior art here but from some cursory searches I think adding an strace option in debug mode might make sense.

Something that would also be possible for a more lightweight instrumentation would be to do the modifications at the script level, i.e., modifying the produced parallel script to do the following:

  1. Monitor process status
  2. Add some form of monitoring command (like pv) between commands in the dataflow to monitor throughput etc.

For reference, the parallel script produced by PaSh can be seen if PaSh is called using the -d 1 option. For example:

$ cat test.sh
cat README.md | tr A-Z a-z | sort
$ ./pa.sh -d 1 test.sh > /dev/null
...
rm_pash_fifos() {
{ rm -f "/tmp/pash_CODhJWu/63f16760c5ba4ff5ad1b8707209a2dcc/#fifo2" ; }
  ...
}
mkfifo_pash_fifos() {
{ mkfifo "/tmp/pash_CODhJWu/63f16760c5ba4ff5ad1b8707209a2dcc/#fifo2" ; }
  ...
}
rm_pash_fifos
mkfifo_pash_fifos
pids_to_kill=""
{ cat README.md >"/tmp/pash_CODhJWu/63f16760c5ba4ff5ad1b8707209a2dcc/#fifo2" & }
pids_to_kill="${!} ${pids_to_kill}"
{ auto-split.sh "/tmp/pash_CODhJWu/63f16760c5ba4ff5ad1b8707209a2dcc/#fifo2" "/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo7" "/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo8" & }
pids_to_kill="${!} ${pids_to_kill}"
{ tr A-Z a-z <"/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo15" >"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo10" & }
pids_to_kill="${!} ${pids_to_kill}"
{ tr A-Z a-z <"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo8" >"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo11" & }
pids_to_kill="${!} ${pids_to_kill}"
{ sort <"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo10" >"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo12" & }
pids_to_kill="${!} ${pids_to_kill}"
{ sort <"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo11" >"/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo13" & }
pids_to_kill="${!} ${pids_to_kill}"
{ eager.sh "/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo7" "/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo15" /tmp/pash_CODhJWu/469d17904a62453eb710dbb9e7e13554/pash_eager_intermediate_1 & }
pids_to_kill="${!} ${pids_to_kill}"
{ eager.sh "/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo12" "/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo16" /tmp/pash_CODhJWu/469d17904a62453eb710dbb9e7e13554/pash_eager_intermediate_2 & }
pids_to_kill="${!} ${pids_to_kill}"
{ runtime/eager.sh "/tmp/pash_CODhJWu/ba84f90ccccd40a88a9431e7ee2cb1a2/#fifo13" "/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo17" /tmp/pash_CODhJWu/469d17904a62453eb710dbb9e7e13554/pash_eager_intermediate_3 & }
pids_to_kill="${!} ${pids_to_kill}"
{ sort -m "/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo16" "/tmp/pash_CODhJWu/4b003a3b930540ab95504f2823895894/#fifo17" & }
pids_to_kill="${!} ${pids_to_kill}"
source wait_for_output_and_sigpipe_rest.sh ${!}
rm_pash_fifos
( exit "${internal_exec_status}" )
...

In this parallel script we could add pv nodes in the place of every fifo, and monitor the spawned commands (since we already have their pids). I am not sure if that introduce any other issues though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants