Always acquire client lock before coordinator lock to avoid deadlocks #1464

dpkp · 2018-03-28T23:47:06Z

This is an attempt to address the deadlock issue described in #1461 . The first option discussed, delaying errback processing until after release of the client lock, would require a lot of work. I've instead tried to change the lock acquisition paths so that the coordinator never attempts to acquire the client lock after it holds the coordinator lock. This is done by either releasing the coordinator lock before calling functions that acquire the client lock, or by preemptively acquiring the client lock before acquiring the coordinator lock.

dpkp · 2018-04-16T03:01:10Z

Any comments on this? If not, I'm going to merge. I have been able to reproduce the deadlock locally on master (though not consistently). With this patch applied I have not seen a deadlock.

jeffwidman

Afraid I haven't had much time lately for open source, and so just now looking at it. Everything looks fine to me at a surface level, but this is a somewhat hairy problem and I'm afraid that right now I don't have the time to dig deeply into the design/problems with the alternative.

If you don't hear from @tvoinarovskyi , then just go ahead and merge.

haosdent · 2018-06-18T10:25:46Z

@dpkp We still encounter the same issue in 1.4.3 Do you need to modify https://github.com/dpkp/kafka-python/blob/1.4.3/kafka/coordinator/base.py#L928 to with self. coordinator._client._lock, self. coordinator._lock:

dpkp · 2018-06-18T19:40:53Z

Yes, I think you're right!

Edit: looking again at this code, there is no path within the block that leads to an attempt to acquire the client lock afaik. So that is why I did not acquire both locks upfront. So I don't think acquiring the client lock here will fix.

If you are still seeing deadlock issues, can you start a new issue and post debug logs ?

…dpkp#1464)

Always acquire client lock before coordinator lock to avoid deadlocks

a556f5c

jeffwidman requested a review from tvoinarovskyi April 6, 2018 18:10

jeffwidman approved these changes Apr 17, 2018

View reviewed changes

dpkp merged commit 1c71dfc into master Apr 18, 2018

jeffwidman deleted the coordinator_client_deadlock branch April 18, 2018 23:23

robgolding mentioned this pull request Jun 13, 2018

ensure_coordinator_ready loops indefinitely if coordinator not ready #1493

Closed

88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 1, 2018

Always acquire client lock before coordinator lock to avoid deadlocks (…

17fcdb1

…dpkp#1464)

88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 1, 2018

Always acquire client lock before coordinator lock to avoid deadlocks (…

03ef066

…dpkp#1464)

zhgjun mentioned this pull request Oct 31, 2018

consumer deadlock happen after log “Heartbeat: local member_id was not recognized; this consumer needs to re-join” #1623

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always acquire client lock before coordinator lock to avoid deadlocks #1464

Always acquire client lock before coordinator lock to avoid deadlocks #1464

dpkp commented Mar 28, 2018

dpkp commented Apr 16, 2018

jeffwidman left a comment

haosdent commented Jun 18, 2018

dpkp commented Jun 18, 2018 •

edited

Loading

Always acquire client lock before coordinator lock to avoid deadlocks #1464

Always acquire client lock before coordinator lock to avoid deadlocks #1464

Conversation

dpkp commented Mar 28, 2018

dpkp commented Apr 16, 2018

jeffwidman left a comment

Choose a reason for hiding this comment

haosdent commented Jun 18, 2018

dpkp commented Jun 18, 2018 • edited Loading

dpkp commented Jun 18, 2018 •

edited

Loading