Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always acquire client lock before coordinator lock to avoid deadlocks #1464

Merged
merged 1 commit into from
Apr 18, 2018

Conversation

dpkp
Copy link
Owner

@dpkp dpkp commented Mar 28, 2018

This is an attempt to address the deadlock issue described in #1461 . The first option discussed, delaying errback processing until after release of the client lock, would require a lot of work. I've instead tried to change the lock acquisition paths so that the coordinator never attempts to acquire the client lock after it holds the coordinator lock. This is done by either releasing the coordinator lock before calling functions that acquire the client lock, or by preemptively acquiring the client lock before acquiring the coordinator lock.

@dpkp
Copy link
Owner Author

dpkp commented Apr 16, 2018

Any comments on this? If not, I'm going to merge. I have been able to reproduce the deadlock locally on master (though not consistently). With this patch applied I have not seen a deadlock.

Copy link
Collaborator

@jeffwidman jeffwidman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afraid I haven't had much time lately for open source, and so just now looking at it. Everything looks fine to me at a surface level, but this is a somewhat hairy problem and I'm afraid that right now I don't have the time to dig deeply into the design/problems with the alternative.

If you don't hear from @tvoinarovskyi , then just go ahead and merge.

@dpkp dpkp merged commit 1c71dfc into master Apr 18, 2018
@jeffwidman jeffwidman deleted the coordinator_client_deadlock branch April 18, 2018 23:23
@haosdent
Copy link
Contributor

@dpkp We still encounter the same issue in 1.4.3 Do you need to modify https://github.com/dpkp/kafka-python/blob/1.4.3/kafka/coordinator/base.py#L928 to with self. coordinator._client._lock, self. coordinator._lock:

@dpkp
Copy link
Owner Author

dpkp commented Jun 18, 2018

Yes, I think you're right!

Edit: looking again at this code, there is no path within the block that leads to an attempt to acquire the client lock afaik. So that is why I did not acquire both locks upfront. So I don't think acquiring the client lock here will fix.

If you are still seeing deadlock issues, can you start a new issue and post debug logs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants