-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suspending jobs with SGE will kill job #1656
Comments
UPDATE: this is not the case. Therefore, I believe it's safe to ignore the
(from man qsub) |
It looks |
Bug report
Expected behavior and actual behavior
On our HPC, we use the SGE scheduler and have implemented the "long queue" as a subordinate queue. The jobs in a subordinate queue will get suspended (
s
state) when the "short queue" gets busy and will be resumed once it is no longer busy.Due to the
-notify
option, SGE sends theSIGUSR1
signal (code138
) to the processes before suspending them. Nextflow considers this as error and kills the jobs.Expected behaviour:
Keep the jobs running, as they did not fail - they will resume later.
Connection to previous issues:
It seems this has been discussed already in #1001 some time ago and was closed without solution, because at that time, Nextflow didn't have a concept for suspended jobs. As far as I can tell, the
s
status has been implemented for SGE/UGE in #1536, however nextflow 20.04.1 still kills the jobs.Possible solution:
I know that 138 is also sent when the soft resource is reached and nextflow considers this
worth killing the jobs. Either reconsider this entirey or possible make the behaviour configurable?
Steps to reproduce the problem
Minimal nextflow script:
Program output
nextflow.log
Environment
version 20.04.1 build 5335
1.8.0_231
GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
CC @riederd
The text was updated successfully, but these errors were encountered: