-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tolerations on gateway dask worker pods #567
Conversation
Sorry I overlooked #536, but the suggestions here are slightly different. For what it's worth I think we should put schedulers in their own nodegroup separate from users (just notebooks) or core (just things jupyterhub and dask pieces that always are running) |
... and might want to add some commits before merging to wrap up #496 (comment) |
noting that dask pods currently will still happily jump onto core nodes if room is available - this has come up before with the suggestion of also adding taints to core nodes (currently they don't have any) pangeo-data/pangeo-stacks#59 |
Where are you hoping that the dask scheduler pods end up? In #536 we're ensuring they end up in the So then the workers are on spot / preemptible nodes and the schedulers are on regular nodes. |
@TomAugspurger AWS Spot versus GCE pre-emptible are a bit different (no 24 hour limit as far as I understand). we've actually been running all nodes on Spot for a number of weeks now (even the core nodes). Typically these run for days and every now and again get rebooted. I guess I'm not too worried about the occasional couple minute interruption. We're not really running any mission critical workflows... Just to clarify we're also installing https://github.com/aws/aws-node-termination-handler so that if the core node is interrupted we have two minutes to automatically launch a new node and move pods to it. Haven't been operating this way for very long, but so far so good!
User nodes seem better than core. Or a separate nodegroup. |
Good to know. I don't have a strong preference about what node pool schedulers end up on. My slight preference is keeping them on regular (non-spot) nodes, since other groups are likely to copy our configuration and I wouldn't call running the scheduler on a spot instance a best practice (at least for mission critical things. The cost-benefit analysis will differ from group to group). |
The worker changes here should be non-controversial though. I'll defer to others (cc @jhamman) on where best to put schedulers. |
@TomAugspurger and @jhamman - My arguments for user nodepool for now are
Ultimately I think we want to decouple the gateway from jupyterhub altogether, correct? This would allow connecting to dask clusters in multiple regions, etc, in which case we eventually want distinct nodegroups for schedulers and workers. Seems this is the current scheduler pod config / resource requests:
|
My 2 cents...
This is possible now but we will still have one gateway per hub. The nice thing about this architecture is that we can connect to gateways outside the k8s cluster that the jhub is in. |
@scottyhq do you have thoughts on a dedicated node pool for schedulers?
…On Fri, Mar 20, 2020 at 11:16 AM Joe Hamman ***@***.***> wrote:
My 2 cents...
- The resources the scheduler pods need are distinct from the jupyter
pods (we wont ever need a gpu for the scheduler pod) so we shouldn't put
them together.
- The potential for poor scheduling on the core pool is a real concern
and we don't want to run into a situation where core pods are overly spread
out
- So, we should probably create a separate node pool for the
schedulers. This should be tuned to support the type of resource requests
our scheduler pods will make and have a similar spot/preemptible profile as
the notebook pods (i.e. if your cluster puts notebooks on spot, its
probably okay to do the same for schedulers)
Ultimately I think we want to decouple the gateway from jupyterhub
altogether, correct?
This is possible now but we will still have one gateway per hub. The nice
thing about this architecture is that we can connect to gateways outside
the k8s cluster that the jhub is in.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#567 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIRBHSOQMG6SIZL4MVLRIOJFDANCNFSM4LN53FNQ>
.
|
Seems like a good approach to me. I suppose we need to change
Then we leave it up to each cluster to create a new nodegroup with this taint: If the nodegroup doesn't exist the scheduler pods will go onto still untainted core nodes |
Yeah, that sounds about right to me. I should have time to add that scheduler pool for GCP deployments today. |
@TomAugspurger and @tjcrone I'm ready to merge this if that's okay. I think scheduler pods will still end up on core nodes, but we can fix that once dask/dask-kubernetes#164 is implemented. Sound good? |
Yep looks good. |
fixes dask/dask-gateway#220
gateway-dask-worker pods get same tolerations as dask-kubernetes defaults https://github.com/dask/dask-kubernetes/blob/b88ebb1f596ffd7b91299191e51fcd7b1df98a29/dask_kubernetes/objects.py#L215
put scheduler pods on user-notebook nodes (although we might want to add a new nodegroup?)
@jhamman @TomAugspurger @tjcrone