Make KubeCluster more configurable #26

yuvipanda · 2018-01-23T07:37:11Z

Add config options for memory / cpu limits and guarantees
Allow an easier escape hatch with extra_container_config and
extra_pod_config, which are dicts that are deep merged into the
pod / container configuration
Minor other cleanups

This allows us an easy escape hatch for the various configuration items that we will not directly support on the constructor

We didn't have it in some places (like env) and did in others, so let's standardize on *not* having them.

This allows it to bind on any available port

This is the kubernetes convention. Reversed order is docker label convention.

Forgot to git add

mrocklin · 2018-01-23T12:44:37Z

I'm concerned that the current approach requires more configuration in the notebook than the average science user will be comfortable with. For example the following code snippet is probably too much boiler plate for a science user to fully grok:

   cluster = KubeCluster(
        loop=loop,
        n_workers=0,
        env={"TEST": "HI"},
        extra_container_config={
            "env": [ {"name": "BOO", "value": "FOO" } ],
            "args": ["last-item"]
        }
    )

And yet it might also be necessary for everyone in a certain group (I expect research groups to have similar specs like this).

I'm still a fan of optionally placing the entire PodSpec in a yaml file if possible. This conforms to a pre-existing specification (Kubernetes itself), and is easy for a group tech leader to configure and distribute to their group.

mrocklin · 2018-01-23T12:46:29Z

daskernetes/core.py

+            )
+        return pod
+
+    def _set_k8s_attribute(self, obj, attribute, value):


It looks like the only reference to self is to get out core_api. Perhaps this should be a standalone function that accepts an api object? That might also make it easier to test.

good idea, I'll do that!

I think the serialize function there could be a static or classmethod, I'll file a bug upstream.

mrocklin · 2018-01-23T12:47:36Z

daskernetes/tests/test_objects.py

+    assert pod.spec.containers[0].image_pull_policy == 'IfNotReady'
+    assert pod.spec.containers[0].security_context == {
+        'runAsUser': 0
+    }


I'm very glad to see this working well :)

mrocklin · 2018-01-23T14:27:07Z

@rsignell-usgs @rabernat you two are probably early users for this. The work here is central in allowing people to customize the environments of their dask clusters. This is probably the sort of thing you would do to help support your research groups.

mrocklin · 2018-01-23T14:27:33Z

We don't expect much of this to look familiar to you today, but it would be good to have your input on the kinds of interactions that you would be comfortable learning.

yuvipanda · 2018-01-23T19:40:40Z

@mrocklin I don't disagree with you in any form or way :) I think it's only a question of factoring. We can easily build a YAML based object on top of this. I don't think we should use YAML in the core of the functionality we have here, because I think the UI for interacting with this should be made available in many ways:

GUI with a JupyterLab extension
YAML files that users manually write / admins write and put in images for them
Directly writing it in code

IMO the 'right' factoring is to keep the core agnostic to how users who don't want deep configurability might want to use it (since I strongly suspect that's going to be a JupyterLab UI rather than YAML! Everyone I've asked to write YAML eventually hates it, and we get lots of indent related errors), and then work on the various GUI aspects.

I think next step after this is to figure out both:

How a YAML spec should look like, how it should be distributed and loaded, etc
How a Jupyter UI for this should look like

Hope that makes sense.

rsignell-usgs · 2018-01-23T19:48:04Z

As @yuvipanda suspected, I would like all of them, and likely would use the GUI with a JupyterLab extension the most. But for sure others would prefer the YAML...

mrocklin · 2018-01-23T20:14:08Z

IMO the 'right' factoring is to keep the core agnostic to how users who don't want deep configurability might want to use it

Yeah, sorry, I wasn't trying to suggest that users must write YAML in order to configure their Pods, merely that we're able to pass through PodSpecs as are commonly defined in other established forms (like standard yaml spec, or the kubernetes Python API). I become nervous when we start adding keywords to configure particular parts of the PodSpec, because this list invariably tends to grow and require tending ove time.

How a YAML spec should look like, how it should be distributed and loaded, etc

My thought is that this would just be the same as someone would define a PodSpec in a normal Kubernetes file. This way we know that our escape hatch is complete.

yuvipanda · 2018-01-23T20:23:57Z

@mrocklin right, I think that is a very valid concern. So the extremes of that as I can see are:

Require an entire PodSpec template be passed in to constructor, and this template can come from YAML / Code. We can modify this to enforce some invariants (around labels, for example), but mostly not.
Have named constructor params for everything, escape hatch is 'make a PR!'

I think both these extremes are not ideal, and we should strike something of a middle ground - where the most basic setups are done by constructor arguments, hitting 60-80% of use cases, and then the rest has an escape hatch. This has worked well for us in kubespawner.

Do you think this is a valid framework to think about it?

If so, then the question becomes 'where is the line?'. IMO that line is fairly clear, and we're pretty much at it. We might want the ability to add annotations, but that should be it for the constructor. Everything else should go through the 'escape hatch'. You can pass in direct output of yaml.load to the escape hatch now and that should work...

It's good to write this down however, and if you think this makes sense I can document this in the comments of the class.

rabernat · 2018-01-23T20:34:47Z

I'm trying to spin up on this to the point where I can give some feedback. But kubernetes is a new language to me.

I understand that this PR adds options for customization of dask workers. Does it also allow custom conda / pip packages to be installed? If so, how does that fit in to the various options discussed above?

yuvipanda · 2018-01-23T21:12:19Z

@rabernat I think it's a little removed from there yet I think.

For me, the most useful thing to know would be 'what would your ideal workflow look like?'

yuvipanda · 2018-01-23T21:16:14Z

@mrocklin upon more reflection, I wonder if:

Require an entire PodSpec template be passed in to constructor, and this template can come from YAML / Code. We can modify this to enforce some invariants (around labels, for example), but mostly not.

Is actually not a bad idea at all. We can provide convenience methods on top of it (essentially what we have now in the KubeCluster constructor), but the core can be kept simple.

mrocklin · 2018-01-23T21:31:22Z

Is actually not a bad idea at all. We can provide convenience methods on top of it (essentially what we have now in the KubeCluster constructor), but the core can be kept simple.

Having this as an option would make me happy. Does this stop us from also accepting keywords that layer dictionaries on top of this base spec for particularly common cases? Which keywords to accept becomes a complex question of course, I'm just curious if it's feasible to have the best of both worlds.

- Have a function, make_pod_spec that accepts a bunch of stuff and gives back a pod template - This allows us to add creating pod templates from YAML, JSON, etc in the future

This is useful because not everyone's host has same kinda python

yuvipanda · 2018-01-30T06:11:32Z

@mrocklin ok, I now have the core KubeCluster object only accepting a pod template, and there's a convenience function for generating said template. We can easily add a convenience function for generating it from yaml as well.

LMK how this looks! I got all tests to pass, although now you've to explicitly pass a '--worker-image' to pytest to tell it what image to use (since my local python is very different from the python used in dask upstream images).

I think easy ways to improve here would be:

Only image is required to make a podspec, we can read the image name from environment variable by default if it isn't given. We can also make other params default to reading from env variables, and populate this in our JupyterHub. This should address Configure cluster defaults from environment variables #28
For a YAML loader, we can simply point to a path that has a YAML file. This file path can also default to a well known location, or be read from environment variable. This again can be set by admins.

yuvipanda · 2018-01-30T06:46:23Z

@mrocklin ok, I've added ability to read pod template info from YAML now.

This code could use a little bit more clarity around naming and what not, but hopefully this approach makes both of us happy? :)

yuvipanda · 2018-01-30T06:57:07Z

I also wonder if I should just move the extra_config merging code to its own function, since then it can be chained with both yaml and non-yaml methods. We could also move the overrides back to KubeCluster constructor.

mrocklin · 2018-01-30T13:21:44Z

daskernetes/core.py

-        self.worker_labels['component'] = 'dask-worker'
+        worker_pod_template.metadata.labels['dask.pydata.org/cluster-name'] = name
+        worker_pod_template.metadata.labels['app'] = 'dask'
+        worker_pod_template.metadata.labels['component'] = 'dask-worker'


Should we do a copy before mutating the given input?

Also, should we expect a Pod or a PodSpec? Do we want the user to specify a an ObjMeta?

Yeah, I think we should accept Pod objects than PodSpec, since that's what all the k8s docs generally specify. And having users be able to do annotations / labels is also quite useful.

mrocklin · 2018-01-30T13:23:08Z

daskernetes/core.py

-        self.worker_image = worker_image
-        self.worker_labels = (worker_labels or {}).copy()
-        self.threads_per_worker = threads_per_worker
-        self.env = dict(env)


If we want to we can continue supporting these kinds of common configurations by modifying the template in the _make_pod method. I don't know if this is desired, but it might be useful if there are parameters that we expect users to want to change?

I think I'm going to move the extra_* stuff to the constructor, and just have a 'common configuration' thing as another classmethod.

mrocklin · 2018-01-30T13:24:09Z

daskernetes/core.py

@@ -55,27 +52,29 @@ class KubeCluster(object):
    """
    def __init__(
            self,
+            worker_pod_template,


It might be nice to accept this as either a kubernetes library object, a dictionary, or a filename, and then normalize all options into a kubernetes object.

I made individual classmethods for these variants instead. I'm generally not a fan of doing polymorphism style stuff in Python constructors, since I thinks this is clearer.

mrocklin · 2018-01-30T13:26:46Z

daskernetes/objects.py

+        pod_template = apiclient.deserialize(
+            _FakeResponse(yaml.safe_load(f)),
+            client.V1Pod
+        )


I wasn't able to make this approach work before. A number of components of the kubernetes object were silently dropped. I had a bunch of hacks on top of this to get my use case to work. Those hacks were pretty fragile. https://github.com/mrocklin/daskernetes/blob/deploy/daskernetes/core.py#L338-L374

This particular problem was my main blocker. I wasn't familiar enough with the kubernetes library to easily find a path out. Most other issues I'm able to handle

This includes failures in the deserialize method that we saw when deploying pangeo.pydata.org

Add failing test for make_pod_from_dict

This makes things much clearer - you are either using python client objects and making an object yourself, or using k8s style JSON / YAML specs with a dictionary.

mrocklin · 2018-02-05T15:08:34Z

Ah, I found the source of my snake_case issue. pod.to_dict() returns a dictionaries with snake_case rather than camelCase. I had wrongly assumed that this would provide dictionaries that we could round-trip with deserialize

yuvipanda added 5 commits January 22, 2018 23:14

Allow merging in arbitrary container and pod config

cb53985

This allows us an easy escape hatch for the various configuration items that we will not directly support on the constructor

Remove 'worker_' prefix from some of the constructor args

444cf8e

We didn't have it in some places (like env) and did in others, so let's standardize on *not* having them.

Add memory / cpu limits / requests explicitly to constructor

b5c5b25

Switch default port to 0

e11ecb8

This allows it to bind on any available port

Use non-reversed order for domain labeling

3d9cd2a

This is the kubernetes convention. Reversed order is docker label convention.

yuvipanda mentioned this pull request Jan 23, 2018

WIP - Pangeo deployment #22

Closed

Add missing tests

bf3bd09

Forgot to git add

mrocklin reviewed Jan 23, 2018

View reviewed changes

yuvipanda added 2 commits January 29, 2018 22:02

Have KubeCluster accept only a Pod template

ff58497

- Have a function, make_pod_spec that accepts a bunch of stuff and gives back a pod template - This allows us to add creating pod templates from YAML, JSON, etc in the future

Allow setting --worker-image from commandline for running tests

92d605a

This is useful because not everyone's host has same kinda python

Add ability to read pod spec from YAML

7b5f38d

mrocklin reviewed Jan 30, 2018

View reviewed changes

yuvipanda added 3 commits January 30, 2018 17:17

Make a copy of the worker_pod_template before changing it

af0835c

Use classmethods to provide convenience methods for making KubeClusters

e2f7a13

Use global ApiClient object for serialization

342af5e

mrocklin and others added 5 commits February 2, 2018 16:59

Add failing test for make_pod_from_dict

397fc43

This includes failures in the deserialize method that we saw when deploying pangeo.pydata.org

Merge pull request #30 from mrocklin/pod-mering

c38f9c1

Add failing test for make_pod_from_dict

Fix failing test for pod_from_dict

fddbb65

Only support camelCase attribute names in overrides

cad7193

This makes things much clearer - you are either using python client objects and making an object yourself, or using k8s style JSON / YAML specs with a dictionary.

Fix another case of merging dictionaries

cea7a84

mrocklin merged commit 30e1e13 into master Feb 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make KubeCluster more configurable #26

Make KubeCluster more configurable #26

yuvipanda commented Jan 23, 2018

mrocklin commented Jan 23, 2018 •

edited

Loading

mrocklin Jan 23, 2018

yuvipanda Jan 23, 2018

mrocklin Jan 23, 2018

mrocklin commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

rsignell-usgs commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

rabernat commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 30, 2018

yuvipanda commented Jan 30, 2018

yuvipanda commented Jan 30, 2018

mrocklin Jan 30, 2018

yuvipanda Jan 30, 2018

mrocklin Jan 30, 2018

yuvipanda Jan 31, 2018

mrocklin Jan 30, 2018

yuvipanda Jan 31, 2018

mrocklin Jan 30, 2018

mrocklin Jan 30, 2018

mrocklin commented Feb 5, 2018

Make KubeCluster more configurable #26

Make KubeCluster more configurable #26

Conversation

yuvipanda commented Jan 23, 2018

mrocklin commented Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

rsignell-usgs commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

rabernat commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

yuvipanda commented Jan 23, 2018

mrocklin commented Jan 23, 2018

yuvipanda commented Jan 30, 2018

yuvipanda commented Jan 30, 2018

yuvipanda commented Jan 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Feb 5, 2018

mrocklin commented Jan 23, 2018 •

edited

Loading