You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@kcibul reports that if the CNNScoreVariants python code throws an exception during async batch processing, the GATK tool hangs (specifically, it was happening when GATK was sending a . for a missing annotation, and the python code was trying to interpret that as a number and blowing up).
It looks like this happens because StreamingPythonScriptExecutor::waitForPreviousBatchCompletion waits for the async write thread Future to complete first, before checking the fifo for an ACK/NCK (which is when the exception would be propagated). If the async write thread is blocked because the fifo is full because the python code isn't retrieving data because an exception was thrown, the java side will hang waiting for the Future complete.
The solution is to reverse the order of the waitForPreviousBatchCompletion checking (ack first, then validate that the async write Future completes). There is a branch with a test and a fix for the StreamingPythonExecutor, and a separate branch with a test for CNNScoreVariants that also has the executor fix. I need to verify that the CNNSCoreVariants test actually fails without the fix, and then this can be turned into a PR, which I'll do when I return from vacation.
The text was updated successfully, but these errors were encountered:
@kcibul reports that if the CNNScoreVariants python code throws an exception during async batch processing, the GATK tool hangs (specifically, it was happening when GATK was sending a . for a missing annotation, and the python code was trying to interpret that as a number and blowing up).
It looks like this happens because
StreamingPythonScriptExecutor::waitForPreviousBatchCompletion
waits for the async write threadFuture
to complete first, before checking the fifo for anACK
/NCK
(which is when the exception would be propagated). If the async write thread is blocked because the fifo is full because the python code isn't retrieving data because an exception was thrown, the java side will hang waiting for theFuture
complete.The solution is to reverse the order of the
waitForPreviousBatchCompletion
checking (ack first, then validate that the async writeFuture
completes). There is a branch with a test and a fix for the StreamingPythonExecutor, and a separate branch with a test for CNNScoreVariants that also has the executor fix. I need to verify that the CNNSCoreVariants test actually fails without the fix, and then this can be turned into a PR, which I'll do when I return from vacation.The text was updated successfully, but these errors were encountered: