fix(replication): potential deadlock when switching master frequently #2516
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This closes #2512.
Currently, the replication thread will wait for the worker's exclusive guard stop before closing db. But it now stops the worker from running new commands after acquiring the worker's exclusive guard, and it might cause deadlock if switches at the same time.
The following steps will show how it may happen:
T0: client A sent
slaveof MASTER_IP0 MASTER_PORT0
, then the replication thread was started and waiting for the exclusive guard.T1: client B sent
slaveof MASTER_IP1 MASTER_PORT1
andAddMaster
will stop the previous replication thread, which is waiting for the exclusive guard. But the exclusive guard is acquired by the current thread.The workaround is also straightforward, just stop workers from running new commands by enabling
is_loading_
to true before acquiring the lock in the replication thread.