Delete PDBs for hub/proxy deployments that doesn't support HA #1937

consideRatio · 2020-12-10T14:43:52Z

For a discussion if this PR is the right way to go, let's continue the discussion in #1934.

Theory

The idea of using a PDB is tightly coupled with applications that support high availability (HA), in other words, to run in parallel to another. But, the hub/proxy/autohttps pod doesn't support it. As an indication of this, see the k8s docs on "configure-pdb" which has a "Before you begin" section that sais "You are the owner of an application runniong on a Kubernetes cluster that requires high availability.

ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/

PR proposal and outcome

The proposal is to remove the PDB resources for our deployment resources that doesn't support HA. it would close #1934.

I perceive the outcome to be that we...

Avoid misleading users to believe we have HA support for the hub/proxy pod through having a chart configurable PDB.
Avoid breaking assumptions of having a restrictive PDB enabled by default for a 1 replica deployment.
Force users wanting to avoid cluster-autoscaler's scale downs with associated pod revocations to cause a disruption of the service to use another strategy to avoid it. Such strategies are...
1. To use an annotation ("cluster-autoscaler.kubernetes.io/safe-to-evict": "false") that influence the Cluster Autoscaler specifically, but wouldn't influence for example an scheduled and automated k8s node upgrade or manual kubectl drain command. For more information see this documentation.
2. To intentful scheduling of the core pods to a node pool separate from the user pool which obviously can needs to be more flexible and scale up and down. This is relatively easy by configuration described in the optimization section. We even have dedicated flags to make this easier such as scheduling.corePods.nodeAffinity.matchNodePurpose configuration.

Why the extra commits?

I was on track to disable these resources by default initially instead of deleting them, but I then concluded it was even better to remove them after having written the commit message of 7ccfd50. By removing them, intent communication is clear and maintenance of something not part of the chart intent is removed.

…fik pods

Some of the motivations for this choice is captured below: 1. Most other Helm chart defaults to having PDBs disabled by default making our chart's behaviour unexpected. 2. The idea of using a PDB is tightly coupled with applications that support high availability, in other words, to run in parallell to another. But, the hub/proxy/autohttps pod doesn't support it. As an indication of this, see the k8s docs on "configure-pdb" which has a "Before you begin" section that sais "You are the owner of an application runniong on a Kubernetes cluster that requires high availability. ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ 3. When evictions take place, such as when a node is drained, the PDBs will be respected, and since we have `minAvailable: 1` and cannot create another one, a manual pod deletion is required. It is my perception that the act of draining a node should come with intent and consideration typically. If it comes as part of an automated k8s upgrade of nodes, it should at least be scheduled at a acceptable time window etc. So, it wouldn't be helpful to add the need to run `kubect delete pod` alongside this. 4. If the wish is to avoid making the cluster autoscaler relocate a pod part of a deployment, then it is more direct and sensible to use the `"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"` annotation. ref: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

consideRatio · 2020-12-11T22:55:30Z

Closed in favor of #1938

consideRatio added 3 commits December 10, 2020 15:53

pdb: for completeness, add disabled by default pdb for autohttps/trae…

1159dad

…fik pods

pdb: relocate proxy pods pdb config

f97c3af

consideRatio force-pushed the pr/pdb-defaults branch from 8a93573 to 0581cdb Compare December 10, 2020 15:00

consideRatio changed the title ~~PDBs for hub/proxy/autohttps: default to disabled, config relocated, autohttps added~~ Delete PDBs for hub/proxy/autohttps deployments that doesn't support HA Dec 10, 2020

consideRatio changed the title ~~Delete PDBs for hub/proxy/autohttps deployments that doesn't support HA~~ Delete PDBs for hub/proxy deployments that doesn't support HA Dec 10, 2020

consideRatio mentioned this pull request Dec 10, 2020

Deployments with 1 replica and a PodDisruptionBudget enabled by default prevent node drain operations from succeeding #1934

Closed

consideRatio force-pushed the pr/pdb-defaults branch from 0581cdb to 7ccfd50 Compare December 10, 2020 23:38

Remove chart PDBs for hub/proxy/autohttps that can't run in HA

2a9789b

consideRatio mentioned this pull request Dec 11, 2020

Disable PDBs for hub/proxy, add PDB for autohttps, and relocate config proxy.pdb to proxy.chp.pdb #1938

Merged

3 tasks

consideRatio closed this Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete PDBs for hub/proxy deployments that doesn't support HA #1937

Delete PDBs for hub/proxy deployments that doesn't support HA #1937

consideRatio commented Dec 10, 2020 •

edited

Loading

consideRatio commented Dec 11, 2020

Delete PDBs for hub/proxy deployments that doesn't support HA #1937

Delete PDBs for hub/proxy deployments that doesn't support HA #1937

Conversation

consideRatio commented Dec 10, 2020 • edited Loading

Theory

PR proposal and outcome

Why the extra commits?

consideRatio commented Dec 11, 2020

consideRatio commented Dec 10, 2020 •

edited

Loading