Fix tolerations on gateway dask worker pods #567

scottyhq · 2020-03-17T22:51:09Z

gateway-dask-worker pods get same tolerations as dask-kubernetes defaults https://github.com/dask/dask-kubernetes/blob/b88ebb1f596ffd7b91299191e51fcd7b1df98a29/dask_kubernetes/objects.py#L215

put scheduler pods on user-notebook nodes (although we might want to add a new nodegroup?)

@jhamman @TomAugspurger @tjcrone

scottyhq · 2020-03-17T22:56:52Z

Sorry I overlooked #536, but the suggestions here are slightly different. For what it's worth I think we should put schedulers in their own nodegroup separate from users (just notebooks) or core (just things jupyterhub and dask pieces that always are running)

scottyhq · 2020-03-17T23:30:58Z

... and might want to add some commits before merging to wrap up #496 (comment)

scottyhq · 2020-03-18T22:21:51Z

noting that dask pods currently will still happily jump onto core nodes if room is available - this has come up before with the suggestion of also adding taints to core nodes (currently they don't have any) pangeo-data/pangeo-stacks#59

TomAugspurger · 2020-03-19T13:49:07Z

Where are you hoping that the dask scheduler pods end up? In #536 we're ensuring they end up in the core pool.

So then the workers are on spot / preemptible nodes and the schedulers are on regular nodes.

scottyhq · 2020-03-19T16:07:35Z

@TomAugspurger AWS Spot versus GCE pre-emptible are a bit different (no 24 hour limit as far as I understand). we've actually been running all nodes on Spot for a number of weeks now (even the core nodes). Typically these run for days and every now and again get rebooted. I guess I'm not too worried about the occasional couple minute interruption. We're not really running any mission critical workflows...

Just to clarify we're also installing https://github.com/aws/aws-node-termination-handler so that if the core node is interrupted we have two minutes to automatically launch a new node and move pods to it. Haven't been operating this way for very long, but so far so good!

Where are you hoping that the dask scheduler pods end up?

User nodes seem better than core. Or a separate nodegroup.

TomAugspurger · 2020-03-19T19:39:52Z

Good to know.

I don't have a strong preference about what node pool schedulers end up on. My slight preference is keeping them on regular (non-spot) nodes, since other groups are likely to copy our configuration and I wouldn't call running the scheduler on a spot instance a best practice (at least for mission critical things. The cost-benefit analysis will differ from group to group).

TomAugspurger · 2020-03-19T19:41:36Z

The worker changes here should be non-controversial though.

I'll defer to others (cc @jhamman) on where best to put schedulers.

scottyhq · 2020-03-19T20:31:00Z

@TomAugspurger and @jhamman - My arguments for user nodepool for now are

If I'm not mistaken in our current dask-gateway setup the dask components are still very linked to the user-notebook session (If I shut down my notebook server the dask-scheduler also disappears).
I like putting anything that should scale to 0 out of the core pool. This is a bit pre-emptive, but I'm afraid of situations where jupyterhub pods and scheduler pods get spread across many more nodes than are necessary (see Support for pod schedulers other than schedulerName: default-scheduler dask/dask-kubernetes#233)

Ultimately I think we want to decouple the gateway from jupyterhub altogether, correct? This would allow connecting to dask clusters in multiple regions, etc, in which case we eventually want distinct nodegroups for schedulers and workers.

Seems this is the current scheduler pod config / resource requests:

    Args:
      dask-gateway-scheduler
      --adaptive-period
      3.0
      --idle-timeout
      0.0
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  2147483648
    Requests:
      cpu:     1
      memory:  2147483648

jhamman · 2020-03-20T16:16:02Z

My 2 cents...

The resources the scheduler pods need are distinct from the jupyter pods (we wont ever need a gpu for the scheduler pod) so we shouldn't put them together.
The potential for poor scheduling on the core pool is a real concern and we don't want to run into a situation where core pods are overly spread out
So, we should probably create a separate node pool for the schedulers. This should be tuned to support the type of resource requests our scheduler pods will make and have a similar spot/preemptible profile as the notebook pods (i.e. if your cluster puts notebooks on spot, its probably okay to do the same for schedulers)

Ultimately I think we want to decouple the gateway from jupyterhub altogether, correct?

This is possible now but we will still have one gateway per hub. The nice thing about this architecture is that we can connect to gateways outside the k8s cluster that the jhub is in.

TomAugspurger · 2020-03-20T19:52:54Z

@scottyhq do you have thoughts on a dedicated node pool for schedulers?

…

On Fri, Mar 20, 2020 at 11:16 AM Joe Hamman ***@***.***> wrote: My 2 cents... - The resources the scheduler pods need are distinct from the jupyter pods (we wont ever need a gpu for the scheduler pod) so we shouldn't put them together. - The potential for poor scheduling on the core pool is a real concern and we don't want to run into a situation where core pods are overly spread out - So, we should probably create a separate node pool for the schedulers. This should be tuned to support the type of resource requests our scheduler pods will make and have a similar spot/preemptible profile as the notebook pods (i.e. if your cluster puts notebooks on spot, its probably okay to do the same for schedulers) Ultimately I think we want to decouple the gateway from jupyterhub altogether, correct? This is possible now but we will still have one gateway per hub. The nice thing about this architecture is that we can connect to gateways outside the k8s cluster that the jhub is in. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#567 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIRBHSOQMG6SIZL4MVLRIOJFDANCNFSM4LN53FNQ> .

scottyhq · 2020-03-23T14:29:44Z

@scottyhq do you have thoughts on a dedicated node pool for schedulers?

Seems like a good approach to me. I suppose we need to change

pangeo-cloud-federation/pangeo-deploy/values.yaml

Line 59 in bd4520e

scheduler:

to

        scheduler:
           extraPodConfig:
             tolerations:
               - key: "k8s.dask.org/dedicated"
                 operator: "Equal"
                 value: "scheduler"
                 effect: "NoSchedule"

Then we leave it up to each cluster to create a new nodegroup with this taint:
Taints: k8s.dask.org/dedicated=scheduler:NoSchedule

If the nodegroup doesn't exist the scheduler pods will go onto still untainted core nodes

TomAugspurger · 2020-03-23T15:02:16Z

Yeah, that sounds about right to me. I should have time to add that scheduler pool for GCP deployments today.

…e nodepool

scottyhq · 2020-03-23T18:55:56Z

@TomAugspurger and @tjcrone I'm ready to merge this if that's okay. I think scheduler pods will still end up on core nodes, but we can fix that once dask/dask-kubernetes#164 is implemented. Sound good?

TomAugspurger · 2020-03-23T19:11:33Z

Yep looks good.

fix tolerations

bd4520e

scottyhq mentioned this pull request Mar 19, 2020

API: Add keyword for match_node_purpose dask/dask-kubernetes#164

Closed

scottyhq added 2 commits March 23, 2020 11:48

add dask-gateway config to aws-uswest2

99fe974

only use slash for worker toleration, scheduler toleration for separt…

9548bb9

…e nodepool

scottyhq merged commit 2f588e2 into pangeo-data:staging Mar 23, 2020

TomAugspurger mentioned this pull request Apr 22, 2020

Poor UX when waiting for scheduler node pool to scale up #587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tolerations on gateway dask worker pods #567

Fix tolerations on gateway dask worker pods #567

scottyhq commented Mar 17, 2020

scottyhq commented Mar 17, 2020

scottyhq commented Mar 17, 2020

scottyhq commented Mar 18, 2020

TomAugspurger commented Mar 19, 2020

scottyhq commented Mar 19, 2020 •

edited

Loading

TomAugspurger commented Mar 19, 2020

TomAugspurger commented Mar 19, 2020

scottyhq commented Mar 19, 2020

jhamman commented Mar 20, 2020

TomAugspurger commented Mar 20, 2020 via email

scottyhq commented Mar 23, 2020

TomAugspurger commented Mar 23, 2020

scottyhq commented Mar 23, 2020

TomAugspurger commented Mar 23, 2020

Fix tolerations on gateway dask worker pods #567

Fix tolerations on gateway dask worker pods #567

Conversation

scottyhq commented Mar 17, 2020

scottyhq commented Mar 17, 2020

scottyhq commented Mar 17, 2020

scottyhq commented Mar 18, 2020

TomAugspurger commented Mar 19, 2020

scottyhq commented Mar 19, 2020 • edited Loading

TomAugspurger commented Mar 19, 2020

TomAugspurger commented Mar 19, 2020

scottyhq commented Mar 19, 2020

jhamman commented Mar 20, 2020

TomAugspurger commented Mar 20, 2020 via email

scottyhq commented Mar 23, 2020

TomAugspurger commented Mar 23, 2020

scottyhq commented Mar 23, 2020

TomAugspurger commented Mar 23, 2020

scottyhq commented Mar 19, 2020 •

edited

Loading