Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: allow current/before PRIMARY to be specified in ERS #16430

Open
timvaillancourt opened this issue Jul 18, 2024 · 4 comments · May be fixed by #16852
Open

Feature Request: allow current/before PRIMARY to be specified in ERS #16430

timvaillancourt opened this issue Jul 18, 2024 · 4 comments · May be fixed by #16852
Assignees
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Feature Request

Comments

@timvaillancourt
Copy link
Contributor

timvaillancourt commented Jul 18, 2024

Feature Description

This issue proposes that the current/before shard PRIMARY can be specified in PlannedReparentShard EmergencyReparentShard RPC requests in order to support a more idempotent(?) operation, where if the provided, "current" PRIMARY is no longer correct, the operation fails or no-ops

This could be implemented by adding an optional request-field ExpectedPrimary *topodatapb.TabletAlias containing the alias of who you think the PRIMARY is "now". In the EmergencyReparentShard .reparentShardLocked(...) logic the current primary can be compared to this alias and a mismatch is handled

Your feedback is much appreciated!

Use Case(s)

This support can prevent situations when an issue is fixed by VTOrc while external automation is also issuing EmergencyReparentShard operations

It would be great if the external automation could say "PRS only if alias X is still the PRIMARY" and if something changed the world in the process, this operation does no go through

@timvaillancourt timvaillancourt added Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTorc Vitess Orchestrator integration Component: vtctl Needs Triage This issue needs to be correctly labelled and triaged labels Jul 18, 2024
@timvaillancourt timvaillancourt self-assigned this Jul 18, 2024
@shlomi-noach shlomi-noach removed the Needs Triage This issue needs to be correctly labelled and triaged label Jul 21, 2024
@GuptaManan100 GuptaManan100 added Component: Cluster management Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed Type: Feature Request Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTorc Vitess Orchestrator integration Component: vtctl labels Jul 26, 2024
@GuptaManan100
Copy link
Member

Your understanding of AvoidPrimaryAlias is correct.

PRS is already idempotent as long as you provide the NewPrimary. If you run the PRS over and over with the new primary alias, after the first PRS that promotes the primary, it will only check that all the replicas are pointing to the correct primary in the subsequent runs.

@GuptaManan100
Copy link
Member

VTOrc only runs PRS for a new uninitialized cluster. After this point, VTOrc never runs PRS. So I'm not sure if there are going to be that many collisions in external automation vs VTOrc running PRS.

@deepthi
Copy link
Member

deepthi commented Aug 5, 2024

There are multiple ways of solving this without adding another flag to PRS

  • external automation checks for the desired condition before issuing a PRS
  • use AvoidPrimaryAlias as Manan suggested
    Manan's point about such collisions being rare in practice is also valid. Perhaps you could elaborate on what situation you are running into and trying to solve.

@timvaillancourt timvaillancourt linked a pull request Sep 26, 2024 that will close this issue
5 tasks
@timvaillancourt timvaillancourt changed the title Feature Request: allow current/before PRIMARY to be specified in PRS Feature Request: allow current/before PRIMARY to be specified in ERS Sep 26, 2024
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Sep 26, 2024

@GuptaManan100 / @deepthi it looks like I made a mistake in relaying the problem internally. What we're looking for is adding the "expected" primary to EmergencyReparentShard, not PlannedReparentShard 🤦

To rephrase the scenario again, we would like the ability to specify the PRIMARY we expect to be the current primary in EmergencyReparentShard requests that importantly do not specify a candidate

This will allow external automation to be sure an ERS will happen only if the requestor's view of the world is still correct. If the PRIMARY changed in the time it took to issue an RPC, we'd like that request to fail instead of triggering a needless reparent

I've implemented this approach in this PR to illustrate: #16852. Your thoughts are appreciated! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Feature Request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants