-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible deadlock with condition variables in Net::IMAP #14
Comments
This is a sample exception from our logs that led to the process not terminating:
Looking at the code, this would have been rescued in https://github.com/ruby/ruby/blob/c1af7b1e1d408f9796a5f46c9ed36bc5adea4aa2/lib/net/imap.rb#L1148 And both conditional variables would have been woken by https://github.com/ruby/ruby/blob/c1af7b1e1d408f9796a5f46c9ed36bc5adea4aa2/lib/net/imap.rb#L1203-L1204 |
I see what you mean here. I guess it's possible where the receiver thread rescues the Exception, sets |
As far as I can tell, I can't seem to reproduce this locally. The only commonality I've found is that this has been reported when running MailRoom in Google Kubernetes Engine (GKE). |
When MailRoom is run in Kubernetes, we have found occasions where MailRoom appears to have attempted to stop running, but `Net::IMAP` is stuck waiting for threads (ruby/net-imap#14). This commit adds an HTTP liveness checker to enable detection of a terminated MailRoom pod.
@stanhu Could you show Ruby-level backtrace using rb_ps defined in .gdbinit bundled with Ruby? gdbdump-ruby may be useful. |
@shugo A customer sent us this:
This does point to the Lines 1297 to 1301 in c04bf8f
Although is this output only showing one thread? |
It appears the deadlock is in the Lines 1055 to 1060 in c04bf8f
|
This does suggest that this Lines 1287 to 1294 in c04bf8f
If it did, It's possible the |
When MailRoom is run in Kubernetes, we have found occasions where MailRoom appears to have attempted to stop running, but `Net::IMAP` is stuck waiting for threads (ruby/net-imap#14). This commit adds an HTTP liveness checker to enable detection of a terminated MailRoom pod.
When MailRoom is run in Kubernetes, we have found occasions where MailRoom appears to have attempted to stop running, but `Net::IMAP` is stuck waiting for threads (ruby/net-imap#14). This commit adds an HTTP liveness checker to enable detection of a terminated MailRoom pod.
When MailRoom is run in Kubernetes, we have found occasions where MailRoom appears to have attempted to stop running, but `Net::IMAP` is stuck waiting for threads (ruby/net-imap#14). This commit adds an HTTP liveness checker to enable detection of a terminated MailRoom pod.
We're using
Net::IMAP
via the MailRoom gem, and quite frequently we are seeing issues with the process not terminating even though we attempt to runThread#join
with a 60-second timeout.A GDB backtrace shows that this is stuck waiting for a conditional variable:
Net::IMAP
uses several conditional variables:idle
: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L965get_tagged_response
: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1215send_literal
: https://github.com/ruby/ruby/blob/48f324e92f9b36edc267f9871e35039cbd1c2eb9/lib/net/imap.rb#L1368We're using a 60-second idle timeout for
idle
. However, the last two do NOT have a timeout, so it's possible we're getting stuck in one of those cases.I noticed that
send_literal
only checks the state of@exception
after the wait returns. Do we need to do this?I also wonder if we need a timeout for these other condition variables.
The text was updated successfully, but these errors were encountered: