Investigate pdb setups #202

yuvipanda · 2021-02-05T14:13:54Z

pdb has been disabled - we should file an issue to investigate
if we want it - jupyterhub/zero-to-jupyterhub-k8s#1938

I'm strongly opinionated after having researched and considered this in depth.

I find there is no point enabling a PDB to enforce either 1 replica availability or 0 replica unavailability on pods not supporting having two separate replicas on - this was the old z2jh default.

Note also that mybinder.org-deploy has explicitly opted out of the old z2jh default, and systematically avoids having behavior like the old z2jh default for all other deployments with 1 replica. This was discussed in jupyterhub/mybinder.org-deploy#1730 (comment) also.

Originally posted by @consideRatio in #194 (comment)

yuvipanda · 2021-02-05T14:53:11Z

So the behavior I want is that no node that contains a hub or proxy pod should ever be automatically deleted (from autoscaling, automatic upgrades, etc) without manual intervention. @consideRatio do you know how we can accomplish that?

consideRatio · 2021-02-05T16:05:10Z

Assuming all automatic deletions will respect PDBs which only work by denying eviction requests, then you do it in the way I was strongly opinionated against ;)

But, assuming you want specifically to stop automatic deletion with regards to cluster downscaling (but not manual k8s node upgrades, or automated maintenance windows doing that), you preferably block cluster downscaling by adding an annotation to make the cluster downscaler don't try to drain a node with such pod on it in the first place.

Btw, I'm not confident a maintenance window upgrade will repspect the PDBs indefinitively, perhaps they only try to for 10 minutes or so and then go with it for example.

yuvipanda · 2021-02-05T17:15:49Z

so I think the hub won't be able to run with multiple replicas for a long, long time :) Until then, I wanna make sure that there's no unannounced downtime for hubs, as much as possible. I do that with the following:

Run a regional cluster, so master upgrades don't disrupt hubs
Run a core node pool
Turn off autoupgrade for the core node pool, so all upgrades are done manually
Enable PDBs, so the core nodes don't get autoscaled down if they have hub or proxy pods. I'm ok autoscaling them for other pods that might exist on them (prometheus, grafana, etc). This seems to work across cloud providers without any configuration.

What do you think of this goal, and the current steps for accomplishing it?

consideRatio · 2021-02-05T18:05:24Z

@yuvipanda I just want to say that I really appreciate your part of the dialogue about this with me. You just keep finding very constructive paths onwards even though it can be tricky when someone is strongly opinionated about something. I really appreciate that!

The goal

I wanna make sure that there's no unannounced downtime for hubs, as much as possible.

Sounds good!

How to achieve the goal
I like step 1-3, while I would be perhaps willing to compromise on step 1 for cost efficiency as regional costs more now days on GKE I think. Downtime on the k8s api-server isn't so crucial as it will still allow the hub to respond and proxy to work etc, but the drawback would be the spawning of user-servers during the downtime only I think.

But regarding the PDBs in point four, I suggest replacing them with the strategy of annotating the hub/proxy pod to make the cluster autoscaler not scale them down. By doing that, we can still manually ask the cloud to do a manual node pool upgrade without also needing to do manually delete the hub/proxy pods when needed whenever that upgrade manually invoked node pool upgrade get stuck.

hub:
  annotations:
    "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
    
proxy:
  annotations:
    "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"

Here is the reference on the annotation:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

yuvipanda · 2021-02-05T18:07:29Z

Glad to work through this, @consideRatio :) Long term solution is to make things HA, of course...

Is that annotation supported by upstream autoscaler?

consideRatio · 2021-02-05T18:13:09Z

@yuvipanda woops updated my reply after you responded with some feedback on the goal itself and points 1-3. Also included a reference about the annotation.

I'm not 100% on what all cloud providers use to autoscale nodes, but I think they typically use that cluster autoscaler, perhaps with their own fork with smaller changes to it.

yuvipanda · 2021-02-17T20:04:19Z

I've now come to the conclusion that @consideRatio is right, and we shouldn't have any pdbs for hub / proxy. We have a separate core node pool that won't scale down to 0, and so the pdb isn't needed for that. Its presence causes issues in node upgrades, replacements, etc - so not worth keeping.

Currently, I see we have pdbs for user placeholders & user schedulers. Do you think we should keep those, @consideRatio?

Thank you for patiently working with me on this :)

consideRatio · 2021-02-17T20:12:05Z

The PDB for user-placeholder pods is to give permission for the pods to be disturbed without restrictions, it is a good default to keep.

The PDB for the user-scheduler which is HA can make sense to have as long as we have the default of 2 replicas set on the user-scheduler deployment, as it ensure there is always one running.

I want to make the following change to the z2jh user-scheduler PDB though, so it will be fine to have the PDB enabled by default even though you lower the replicas to 1.

     pdb:
       enabled: true
-      minAvailable: 1
+      maxUnavailable: 1

yuvipanda · 2021-02-17T20:13:23Z

Makes sense! Thanks for the explanation :)

I want to run just one scheduler per cluster, since they'll all be configured the same! That's unrelated to this though.

consideRatio · 2021-02-17T20:44:37Z

Yeah it may be a bit overkill to have user-scheduler run with two replicas. They use leader-election so in practice it is only one doing the work and the other is standing by to do work if the other fails I think.

I created a PR in z2jh btw: jupyterhub/zero-to-jupyterhub-k8s#2039

yuvipanda · 2021-03-21T20:15:32Z

The new defaults for z2jh are the right thing to do now. Thanks for pushing this thorugh, @consideRatio!

consideRatio · 2021-03-21T20:29:43Z

It's a pleasure working with you @yuvipanda, I'm very thankful for your communication skills!

yuvipanda mentioned this issue Feb 5, 2021

Bump hub version to 0.11 #194

Merged

yuvipanda closed this as completed Mar 21, 2021

yuvipanda mentioned this issue Nov 28, 2023

LEAP outage due to all core nodes being replaced #3461

Open

2 tasks

yuvipanda reopened this Nov 28, 2023

yuvipanda closed this as completed Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate pdb setups #202

Investigate pdb setups #202

yuvipanda commented Feb 5, 2021

yuvipanda commented Feb 5, 2021

consideRatio commented Feb 5, 2021

yuvipanda commented Feb 5, 2021

consideRatio commented Feb 5, 2021 •

edited

Loading

yuvipanda commented Feb 5, 2021 •

edited

Loading

consideRatio commented Feb 5, 2021

yuvipanda commented Feb 17, 2021 •

edited

Loading

consideRatio commented Feb 17, 2021

yuvipanda commented Feb 17, 2021

consideRatio commented Feb 17, 2021 •

edited

Loading

yuvipanda commented Mar 21, 2021

consideRatio commented Mar 21, 2021

Investigate pdb setups #202

Investigate pdb setups #202

Comments

yuvipanda commented Feb 5, 2021

yuvipanda commented Feb 5, 2021

consideRatio commented Feb 5, 2021

yuvipanda commented Feb 5, 2021

consideRatio commented Feb 5, 2021 • edited Loading

yuvipanda commented Feb 5, 2021 • edited Loading

consideRatio commented Feb 5, 2021

yuvipanda commented Feb 17, 2021 • edited Loading

consideRatio commented Feb 17, 2021

yuvipanda commented Feb 17, 2021

consideRatio commented Feb 17, 2021 • edited Loading

yuvipanda commented Mar 21, 2021

consideRatio commented Mar 21, 2021

consideRatio commented Feb 5, 2021 •

edited

Loading

yuvipanda commented Feb 5, 2021 •

edited

Loading

yuvipanda commented Feb 17, 2021 •

edited

Loading

consideRatio commented Feb 17, 2021 •

edited

Loading