-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure scheduler, worker pods on correct node pool #536
Ensure scheduler, worker pods on correct node pool #536
Conversation
Two changes here: 1. Ensure that worker pods are on preemptible nodes. This requires that the k8s cluster have nodes with the `k8s.dask.org_dedicated:worker` label. Otherwise workers won't start. Do we want this in the base config used for all our hubs? 2. Ensure that scheduler pods are *not* on preemptible nodes. Right now, we can have scheduler pods end up in the preemptible nodes in the dask pool ``` $ kubectl describe pod -n dev-staging dask-gateway-tomaugspurger-scheduler-df5cb0c595ca4c80adc12a82c84e7150 | grep pool Node: gke-dev-pangeo-io-cluster-dask-pool-f89fa71c-rh7b/10.128.0.82 Normal Scheduled 3m4s default-scheduler Successfully assigned dev-staging/dask-gateway-tomaugspurger-scheduler-df5cb0c595ca4c80adc12a82c84e7150 to gke-dev-pangeo-io-cluster-dask-pool-f89fa71c-rh7b ... ``` By removing the toleration, they won't end up there. I think this means they'll always end up in the `core-pool`, which is currently not set up to autoscale. We'll need to adjust that before merging this.
@TomAugspurger, I think because the branch you are trying to merge is so far behind staging, and there was one change to fix the hubploy issues (see #533), your PR will not pass the initial checks. I would consider rebasing your branch and trying again. You could also merge upstream/staging if you do not want to rebase. |
Thanks, hopefully fixed now. |
I made the core-pool autoscalable using the GCP web UI. I think this should be good to go. It'll need a bit of testing to make sure I got the taints / tolerations correct to ensure that the scheduler doesn't end up in the worker or jupyter pools. |
Thanks @TomAugspurger. This is currently blocked by #560 but this should be good to go. |
Huh, so perhaps this PR is unnecessary now. in pangeo:
dask-gateway:
gateway:
clusterManager:
scheduler:
extraPodConfig:
tolerations:
- key: "k8s.dask.org/dedicated"
operator: "Equal"
value: "scheduler"
effect: "NoSchedule" which IIUC matches the taint we added to the new scheduler node pool in #569. |
Closing, since nothing more should be required. |
Two changes here:
This requires that the k8s cluster have nodes with the
k8s.dask.org_dedicated:worker
label. Otherwise workers won't start.Do we want this in the base config used for all our hubs?
Right now, we can have scheduler pods end up in the preemptible nodes in
the dask pool
By removing the toleration, they won't end up there. I think this means
they'll always end up in the
core-pool
, which is currently not set upto autoscale. We'll need to adjust that before merging this. @jhamman
does putting schedulers in the core-pool sound OK? If so, is it OK to autoscale it, or should we set up a dedicated
dask-scheduler
that's not preemptible?