-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication workflows: retry "wrong tablet type" errors #16645
VReplication workflows: retry "wrong tablet type" errors #16645
Conversation
Signed-off-by: Rohit Nayak <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16645 +/- ##
==========================================
+ Coverage 68.96% 68.97% +0.01%
==========================================
Files 1562 1562
Lines 200730 200733 +3
==========================================
+ Hits 138430 138457 +27
+ Misses 62300 62276 -24 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very simple solution and makes perfect sense.
I don't like how we identify the error by text/regexp; we should use something more formal, like a specific error object, or some error code, or something. But this is outside the cope of this PR.
I reused how query serving and vttablet state manager were using that error in |
Absolutely. This was a side note unrelated to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Rohit Nayak <[email protected]>
|
…rs (#16645) (#16653) Signed-off-by: Rohit Nayak <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Signed-off-by: Rohit Nayak <[email protected]>
…rs (#16645) (#16651) Signed-off-by: Rohit Nayak <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Rohit Nayak <[email protected]>
…rs (#16645) (#16652) Signed-off-by: Rohit Nayak <[email protected]> Co-authored-by: Rohit Nayak <[email protected]> Co-authored-by: Rohit Nayak <[email protected]>
) Signed-off-by: Rohit Nayak <[email protected]>
Description
If a tablet picker is configured to pick
REPLICA
tablet types only, and when it starts streaming the selected tablet is promoted to aPRIMARY
because of a PRS or ERS, the vttablet returns an grpc error like"error": "vttablet: rpc error: code = FailedPrecondition desc = wrong tablet type: PRIMARY, want: REPLICA or []",
Currently all
FailedPrecondition
s are treated as unrecoverable errors for vreplication workflows. This PR adds an exception for thewrong tablet
FailedPrecondition
error treats it as recoverable.This will resolve situations where there was race where
REPLICA
query.Target
which has the expectedTabletType
Note:
It is possible that this is not a race, but a missing watcher event. In that case, unless the local health check state gets refreshed with the new info, we will still see the same errors, if that tablet was selected again.
Backport reason: We should fix this on all supported versions, since it is a long-standing issue.
Related Issue(s)
#16646
Checklist