Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill descendant processes in core.direct schedulers plugin #6572

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions src/aiida/schedulers/plugins/direct.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,9 +354,18 @@ def _parse_submit_output(self, retval, stdout, stderr):

return stdout.strip()

def _get_kill_command(self, jobid):
"""Return the command to kill the job with specified jobid."""
submit_command = f'kill {jobid}'
def _get_kill_command(self, process_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By changing jobid to process_id you broke the log line on line 370. Either keep it as jobid or adapt other lines that referenced it accordingly. This would be a breaking change, but since it is an internal method it is ok to change

"""Return the command to kill the process with specified pid and all its descendants."""
agoscinski marked this conversation as resolved.
Show resolved Hide resolved
from psutil import Process

# get a list of the process id of all descendants
process = Process(int(process_id))
children = process.children(recursive=True)
process_ids = [process_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should cast to str here explicitly to be safe. Before, it was used in an f-string, which automatically casts, but now you are using it as arguments to ' '.join() which will fail if the elements are not all strings.

Suggested change
process_ids = [process_id]
process_ids = [str(process_id)]

process_ids.extend([str(child.pid) for child in children])
process_ids_str = ' '.join(jobids)
agoscinski marked this conversation as resolved.
Show resolved Hide resolved

submit_command = f'kill {process_ids_str}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side node:
I've encountered cases where kill PID silently returns without actually killing a job.
I would suggest handling this scenario, if PID still exists after sending the command kill PID.
then properly inform with a log message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well take the opportunity to fix the variable name

Suggested change
submit_command = f'kill {process_ids_str}'
kill_command = f'kill {process_ids_str}'


self.logger.info(f'killing job {jobid}')

Expand Down
Loading