Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete PDBs for hub/proxy deployments that doesn't support HA #1937

Closed
wants to merge 4 commits into from

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Dec 10, 2020

For a discussion if this PR is the right way to go, let's continue the discussion in #1934.

Theory

The idea of using a PDB is tightly coupled with applications that support high availability (HA), in other words, to run in parallel to another. But, the hub/proxy/autohttps pod doesn't support it. As an indication of this, see the k8s docs on "configure-pdb" which has a "Before you begin" section that sais "You are the owner of an application runniong on a Kubernetes cluster that requires high availability.

ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/

PR proposal and outcome

The proposal is to remove the PDB resources for our deployment resources that doesn't support HA. it would close #1934.

I perceive the outcome to be that we...

  • Avoid misleading users to believe we have HA support for the hub/proxy pod through having a chart configurable PDB.
  • Avoid breaking assumptions of having a restrictive PDB enabled by default for a 1 replica deployment.
  • Force users wanting to avoid cluster-autoscaler's scale downs with associated pod revocations to cause a disruption of the service to use another strategy to avoid it. Such strategies are...
    1. To use an annotation ("cluster-autoscaler.kubernetes.io/safe-to-evict": "false") that influence the Cluster Autoscaler specifically, but wouldn't influence for example an scheduled and automated k8s node upgrade or manual kubectl drain command. For more information see this documentation.
    2. To intentful scheduling of the core pods to a node pool separate from the user pool which obviously can needs to be more flexible and scale up and down. This is relatively easy by configuration described in the optimization section. We even have dedicated flags to make this easier such as scheduling.corePods.nodeAffinity.matchNodePurpose configuration.

Why the extra commits?

I was on track to disable these resources by default initially instead of deleting them, but I then concluded it was even better to remove them after having written the commit message of 7ccfd50. By removing them, intent communication is clear and maintenance of something not part of the chart intent is removed.

Some of the motivations for this choice is captured below:

1. Most other Helm chart defaults to having PDBs disabled by default
   making our chart's behaviour unexpected.

2. The idea of using a PDB is tightly coupled with applications that
   support high availability, in other words, to run in parallell to
   another. But, the hub/proxy/autohttps pod doesn't support it. As an
   indication of this, see the k8s docs on "configure-pdb" which has a
   "Before you begin" section that sais "You are the owner of an
   application runniong on a Kubernetes cluster that requires high
   availability.

   ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/

3. When evictions take place, such as when a node is drained, the PDBs
   will be respected, and since we have `minAvailable: 1` and cannot
   create another one, a manual pod deletion is required.

   It is my perception that the act of draining a node should come with
   intent and consideration typically. If it comes as part of an
   automated k8s upgrade of nodes, it should at least be scheduled at a
   acceptable time window etc. So, it wouldn't be helpful to add the
   need to run `kubect delete pod` alongside this.

4. If the wish is to avoid making the cluster autoscaler relocate a pod
   part of a deployment, then it is more direct and sensible to use the
   `"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"`
   annotation.

   ref: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
@consideRatio consideRatio changed the title PDBs for hub/proxy/autohttps: default to disabled, config relocated, autohttps added Delete PDBs for hub/proxy/autohttps deployments that doesn't support HA Dec 10, 2020
@consideRatio consideRatio changed the title Delete PDBs for hub/proxy/autohttps deployments that doesn't support HA Delete PDBs for hub/proxy deployments that doesn't support HA Dec 10, 2020
@consideRatio
Copy link
Member Author

Closed in favor of #1938

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant