You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After analyzing the gdb stack, we found one of the worker threads was pending when waiting for the ReplicationThread::Stop. And ReplicationThread::Stop is blocking on WorkConcurrencyGuard which should be acquired by the current worker.
That said, the ReplicationThread is waiting for WorkExclusivityGuard but it's acquired by the worker thread which is waiting for itself.
…apache#2516)
This closesapache#2512.
Currently, the replication thread will wait for the worker's exclusive guard stop before closing db.
But it now stops the worker from running new commands after acquiring the worker's exclusive guard,
and it might cause deadlock if switches at the same time.
The following steps will show how it may happen:
- T0: client A sent `slaveof MASTER_IP0 MASTER_PORT0`, then the replication thread was started and waiting for the exclusive guard.
- T1: client B sent `slaveof MASTER_IP1 MASTER_PORT1` and `AddMaster` will stop the previous replication thread, which is waiting for the exclusive guard. But the exclusive guard is acquired by the current thread.
The workaround is also straightforward, just stop workers from running new commands by enabling `is_loading_` to
true before acquiring the lock in the replication thread.
Search before asking
Version
unstable
Minimal reproduce step
None
What did you expect to see?
Won't cause deadlock at any situation
What did you see instead?
The worker threads are stuck after switching master frequently:
Anything Else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: