Doing a Kubernetes upgrade if JupyterHub is running will stall the Kubernetes upgrade #1575

kmhughes · 2020-02-13T18:21:01Z

I have installed the JupyterHub chart 0.8.2.

I was doing an automatic upgrade of my Kubernetes cluster on GKE. Eventually the update stalled out. After some research I found that the node running the Jupyter Hub Hub pod was not updating. The Pod Disruption Budget for the Hub pod requires a minimum of 1 instance of the pod. This meant the single instance of the Hub could not be shut down because that would mean 0 instances of the Hub.

The deployment.yaml template for the Hub only allows for a single instance of the Hub (replicas: 1) and there is no way in the values.yaml file to specify the number of replicas desired.

jupyterhub/templates/hub/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hub
  labels:
    {{- include "jupyterhub.labels" . | nindent 4 }}
spec:
  replicas: 1  <= NOTICE CANNOT BE CHANGED

This means if you enable a pod disruption budget, you cannot drain the node running the Hub pod unless you have a minimum value of 0 in your pod disruption budget.

Please make it so that the number of replicas of the Hub pod can be specified in a values.yaml file. You might also want to mention having a pod disruption budget enabled with only 1 replica will lead to this problem.

I have not checked if the other pods have similar issues.

The text was updated successfully, but these errors were encountered:

manics · 2020-02-14T12:21:17Z

This sounds like a valid use case. I've transferred this issue to the zero-to-jupyterhub repo which is where development happens. https://github.com/jupyterhub/helm-chart is for storing and publishing the charts.

kmhughes · 2020-02-14T15:06:47Z

@manics Thank you!

kmhughes · 2020-02-14T15:11:17Z

If you like, I could create a pull request with the change.

betatim · 2020-02-14T16:01:59Z

Right now you can not run more than one JupyterHub pod at a time because of shared state. There are several people who are interested in changing this but this is a significant project so this will take some time.

In the mean time I think the thing to do is to document that automatic upgrades or other operations that require automatic pod relocation won't work and what to do in this situation. I think it is better to document it and tell users that they need to explicitly delete the pod. This way the admin has full control over when the brief interruption to users will happen by choosing the moment when they delete the pod.

kmhughes · 2020-02-14T17:03:41Z

OK, good to know. I may disable the pod disruption budget for now.

Definitely an important thing to document, I waited for over an hour for that node to drain before I figured out something was wrong. I have't upgraded the cluster that often, so didn't fully understand if there would be a lot of variability in draining or not. Once JupyterHub was down, the nodes were pretty much clockwork.

betatim · 2020-02-17T14:21:46Z

Agreed on the documentation as it is a pitfall/source of frustration/confusion. Do you want to open a PR to add this?

kmhughes · 2020-02-17T15:54:37Z

Let me go through the documentation generator and get an initial stab at it. I have not done ReadTheDocs before, but it will probably be good to learn.

…

On Mon, Feb 17, 2020 at 7:21 AM Tim Head ***@***.***> wrote: Agreed on the documentation as it is a pitfall/source of frustration/confusion. Do you want to open a PR to add this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1575?email_source=notifications&email_token=ABO2Q4ZQ4JHM36D6Q26VRL3RDKMPVA5CNFSM4KVHBDLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL6STAY#issuecomment-587016579>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABO2Q44HTT5PVDX7KDSX2KDRDKMPVANCNFSM4KVHBDLA> .

consideRatio · 2020-10-07T22:30:15Z

I'm starting to think that we should disable our PDBs by default unless with have two replicas, like for our user-scheduler pods. I don't see when PDB does more good than harm for the hub, proxy, autohttps pod atm.

@betatim what do you think at this point?

consideRatio · 2020-10-07T22:35:56Z

I don't see a clear win in allow the hub replicas be configurable, as it may indicate that it would make sense to increase it, and it would typically lead to not obvious runtime errors having multiple replicas, and to temporary make it zero one can do kubectl edit deploy/hub quicker than an helm upgrade I figure.

Unless i see a strong benefit of a PDB for the hub/proxy/autohttps pod, i think they should be allowed to disrupted during upgrades etc, I find that to be easier than to need to manually delete the pods. Anyone making a k8s version upgrade of JupyterHub should be aware that it will cause disruptions since we are not a HA helm chart, so then I'd say its better to just disrupt quickly without fuzz.

consideRatio · 2020-10-08T00:35:06Z

Referencing a quote from #1649 (comment)

I also would like to have a simple active/passive based failover model so I can automate upgrades of nodes and clusters without having to modify the PDB or force evict the hub pod.

consideRatio · 2021-02-16T13:01:24Z

This is closed by a change to PDBs along with #1934 in #1938!

manics transferred this issue from jupyterhub/helm-chart Feb 14, 2020

consideRatio added the maintenance label Oct 7, 2020

consideRatio mentioned this issue Dec 8, 2020

Deployments with 1 replica and a PodDisruptionBudget enabled by default prevent node drain operations from succeeding #1934

Closed

consideRatio closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doing a Kubernetes upgrade if JupyterHub is running will stall the Kubernetes upgrade #1575

Doing a Kubernetes upgrade if JupyterHub is running will stall the Kubernetes upgrade #1575

kmhughes commented Feb 13, 2020

manics commented Feb 14, 2020

kmhughes commented Feb 14, 2020 •

edited

Loading

kmhughes commented Feb 14, 2020

betatim commented Feb 14, 2020

kmhughes commented Feb 14, 2020

betatim commented Feb 17, 2020

kmhughes commented Feb 17, 2020 via email

consideRatio commented Oct 7, 2020 •

edited

Loading

consideRatio commented Oct 7, 2020

consideRatio commented Oct 8, 2020

consideRatio commented Feb 16, 2021 •

edited

Loading

Doing a Kubernetes upgrade if JupyterHub is running will stall the Kubernetes upgrade #1575

Doing a Kubernetes upgrade if JupyterHub is running will stall the Kubernetes upgrade #1575

Comments

kmhughes commented Feb 13, 2020

manics commented Feb 14, 2020

kmhughes commented Feb 14, 2020 • edited Loading

kmhughes commented Feb 14, 2020

betatim commented Feb 14, 2020

kmhughes commented Feb 14, 2020

betatim commented Feb 17, 2020

kmhughes commented Feb 17, 2020 via email

consideRatio commented Oct 7, 2020 • edited Loading

consideRatio commented Oct 7, 2020

consideRatio commented Oct 8, 2020

consideRatio commented Feb 16, 2021 • edited Loading

kmhughes commented Feb 14, 2020 •

edited

Loading

consideRatio commented Oct 7, 2020 •

edited

Loading

consideRatio commented Feb 16, 2021 •

edited

Loading