-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock between health check and topology watcher #16995
Conversation
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16995 +/- ##
==========================================
- Coverage 69.34% 67.02% -2.32%
==========================================
Files 1571 1571
Lines 204180 251681 +47501
==========================================
+ Hits 141582 168688 +27106
- Misses 62598 82993 +20395 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]> Signed-off-by: Florent Poinsard <[email protected]>
…#16995) (#17009) Signed-off-by: Manan Gupta <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
…#16995) (#17008) Signed-off-by: Manan Gupta <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
…#16995) (#17007) Signed-off-by: Manan Gupta <[email protected]> Signed-off-by: Florent Poinsard <[email protected]> Co-authored-by: Manan Gupta <[email protected]>
…#16995) (#17010) Signed-off-by: Manan Gupta <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Description
This PR fixes the deadlock described in #16994.
The problem was that the sending on the channel
loadTabletsTrigger
inupdateHealth
was a blocking call. So, in this PR, I've changed the channel from being an unbuffered channel to a buffered channel of a single capacity. I have also updated the sending on the channelloadTabletsTrigger
inupdateHealth
to be unblocking.This fix has 2 advantages -
updateHealth
cannot block on sending on the channel. It will either succeed in sending, or just move ahead since the channel would already be full.updateHealth
calls happen really quickly, we won't trigger multiple runs of loadTablet since we won't send on an already full channel.Since this is a fix for a deadlock, it needs to be backported to all release branches since all of them have this issue.
Related Issue(s)
Checklist
Deployment Notes