Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set leave_on_terminate=true for servers and hardcode maxUnavailable=1 #3000

Merged
merged 1 commit into from
Jan 16, 2024

Commits on Jan 16, 2024

  1. Set leave_on_terminate=true for servers and hardcode maxUnavailable=1

    When leave_on_terminate=false (default), rolling the statefulset is
    disruptive because the new servers come up with the same node IDs but
    different IP addresses. They can't join the server cluster until the old
    server's node ID is marked as failed by serf. During this time, they continually
    start leader elections because they don't know there's a leader. When
    they eventually join the cluster, their election term is higher, and so
    they trigger a leadership swap. The leadership swap happens at the same
    time as the next node to be rolled is being stopped, and so the cluster
    can end up without a leader.
    
    With leave_on_terminate=true, the stopping server cleanly leaves the
    cluster, so the new server can join smoothly, even though it has the
    same node ID as the old server. This increases the speed of the rollout
    and in my testing eliminates the period without a leader.
    
    The downside of this change is that when a server leaves gracefully, it
    also reduces the number of raft peers. The number of peers is used to
    calculate the quorum size, so this can unexpectedly change the fault
    tolerance of the cluster. When running with an odd number of servers, 1
    server leaving the cluster does not affect quorum size. E.g. 5 servers
    => quorum 3, 4 servers => quorum still 3. During a rollout, Kubernetes
    only stops 1 server at a time, so the quorum won't change. During a
    voluntary disruption event, e.g. a node being drained, Kubernetes uses
    the pod disruption budget to determine how many pods in a statefulset
    can be made unavailable at a time. That's why this change hardcodes this
    number to 1 now.
    
    Also set autopilot min_quorum to min quorum and disable autopilot
    upgrade migration since that's for blue/green deploys.
    lkysow committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    1722e1c View commit details
    Browse the repository at this point in the history