Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(replication): potential deadlock when switching master frequently #2516

Merged

Conversation

git-hulk
Copy link
Member

@git-hulk git-hulk commented Sep 1, 2024

This closes #2512.

Currently, the replication thread will wait for the worker's exclusive guard stop before closing db. But it now stops the worker from running new commands after acquiring the worker's exclusive guard, and it might cause deadlock if switches at the same time.

The following steps will show how it may happen:

  • T0: client A sent slaveof MASTER_IP0 MASTER_PORT0, then the replication thread was started and waiting for the exclusive guard.

  • T1: client B sent slaveof MASTER_IP1 MASTER_PORT1 and AddMaster will stop the previous replication thread, which is waiting for the exclusive guard. But the exclusive guard is acquired by the current thread.

The workaround is also straightforward, just stop workers from running new commands by enabling is_loading_ to true before acquiring the lock in the replication thread.

@git-hulk git-hulk changed the title Fix potential deadlock when switching master frequently fix(replication): potential deadlock when switching master frequently Sep 1, 2024
Currently, the replication thread will wait for the worker exclusive
guard stop before closing the db. But it now stops the worker from
running new commands after acquiring the worker exclusive guard, and it
might cause deadlock if switches at the same time.

The following steps will show how it may happen:

- T0: client A sent `slaveof MASTER_IP0 MASTER_PORT0`, then the
replication thread was started and waiting for the exclusive guard.

- T1: client B sent `slaveof MASTER_IP1 MASTER_PORT1` and `AddMaster`
  will stop the previous replication thread, which is waiting for the
  exclusive guard. But the exclusive guard is acquiring by the current
  thread.

And the workaround is also straightforward, just stop workers from
running new commands by enabling `is_loading_` to true before acquiring
the lock in replication thread.
Copy link

sonarcloud bot commented Sep 2, 2024

@git-hulk git-hulk merged commit ab41cbb into apache:unstable Sep 2, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential dead lock if switching different master frequently
2 participants