Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make KubeCluster more configurable #26

Merged
merged 17 commits into from
Feb 5, 2018
Merged

Make KubeCluster more configurable #26

merged 17 commits into from
Feb 5, 2018

Conversation

yuvipanda
Copy link
Collaborator

  • Add config options for memory / cpu limits and guarantees
  • Allow an easier escape hatch with extra_container_config and
    extra_pod_config, which are dicts that are deep merged into the
    pod / container configuration
  • Minor other cleanups

This allows us an easy escape hatch for the various configuration
items that we will not directly support on the constructor
We didn't have it in some places (like env) and did in others,
so let's standardize on *not* having them.
This allows it to bind on any available port
This is the kubernetes convention. Reversed order is docker
label convention.
@yuvipanda yuvipanda mentioned this pull request Jan 23, 2018
Forgot to git add
@mrocklin
Copy link
Member

mrocklin commented Jan 23, 2018

I'm concerned that the current approach requires more configuration in the notebook than the average science user will be comfortable with. For example the following code snippet is probably too much boiler plate for a science user to fully grok:

   cluster = KubeCluster(
        loop=loop,
        n_workers=0,
        env={"TEST": "HI"},
        extra_container_config={
            "env": [ {"name": "BOO", "value": "FOO" } ],
            "args": ["last-item"]
        }
    )

And yet it might also be necessary for everyone in a certain group (I expect research groups to have similar specs like this).

I'm still a fan of optionally placing the entire PodSpec in a yaml file if possible. This conforms to a pre-existing specification (Kubernetes itself), and is easy for a group tech leader to configure and distribute to their group.

)
return pod

def _set_k8s_attribute(self, obj, attribute, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only reference to self is to get out core_api. Perhaps this should be a standalone function that accepts an api object? That might also make it easier to test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, I'll do that!

I think the serialize function there could be a static or classmethod, I'll file a bug upstream.

assert pod.spec.containers[0].image_pull_policy == 'IfNotReady'
assert pod.spec.containers[0].security_context == {
'runAsUser': 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad to see this working well :)

@mrocklin
Copy link
Member

@rsignell-usgs @rabernat you two are probably early users for this. The work here is central in allowing people to customize the environments of their dask clusters. This is probably the sort of thing you would do to help support your research groups.

@mrocklin
Copy link
Member

We don't expect much of this to look familiar to you today, but it would be good to have your input on the kinds of interactions that you would be comfortable learning.

@yuvipanda
Copy link
Collaborator Author

@mrocklin I don't disagree with you in any form or way :) I think it's only a question of factoring. We can easily build a YAML based object on top of this. I don't think we should use YAML in the core of the functionality we have here, because I think the UI for interacting with this should be made available in many ways:

  1. GUI with a JupyterLab extension
  2. YAML files that users manually write / admins write and put in images for them
  3. Directly writing it in code

IMO the 'right' factoring is to keep the core agnostic to how users who don't want deep configurability might want to use it (since I strongly suspect that's going to be a JupyterLab UI rather than YAML! Everyone I've asked to write YAML eventually hates it, and we get lots of indent related errors), and then work on the various GUI aspects.

I think next step after this is to figure out both:

  1. How a YAML spec should look like, how it should be distributed and loaded, etc
  2. How a Jupyter UI for this should look like

Hope that makes sense.

@rsignell-usgs
Copy link

As @yuvipanda suspected, I would like all of them, and likely would use the GUI with a JupyterLab extension the most. But for sure others would prefer the YAML...

@mrocklin
Copy link
Member

IMO the 'right' factoring is to keep the core agnostic to how users who don't want deep configurability might want to use it

Yeah, sorry, I wasn't trying to suggest that users must write YAML in order to configure their Pods, merely that we're able to pass through PodSpecs as are commonly defined in other established forms (like standard yaml spec, or the kubernetes Python API). I become nervous when we start adding keywords to configure particular parts of the PodSpec, because this list invariably tends to grow and require tending ove time.

How a YAML spec should look like, how it should be distributed and loaded, etc

My thought is that this would just be the same as someone would define a PodSpec in a normal Kubernetes file. This way we know that our escape hatch is complete.

@yuvipanda
Copy link
Collaborator Author

@mrocklin right, I think that is a very valid concern. So the extremes of that as I can see are:

  1. Require an entire PodSpec template be passed in to constructor, and this template can come from YAML / Code. We can modify this to enforce some invariants (around labels, for example), but mostly not.
  2. Have named constructor params for everything, escape hatch is 'make a PR!'

I think both these extremes are not ideal, and we should strike something of a middle ground - where the most basic setups are done by constructor arguments, hitting 60-80% of use cases, and then the rest has an escape hatch. This has worked well for us in kubespawner.

Do you think this is a valid framework to think about it?

If so, then the question becomes 'where is the line?'. IMO that line is fairly clear, and we're pretty much at it. We might want the ability to add annotations, but that should be it for the constructor. Everything else should go through the 'escape hatch'. You can pass in direct output of yaml.load to the escape hatch now and that should work...

It's good to write this down however, and if you think this makes sense I can document this in the comments of the class.

@rabernat
Copy link

I'm trying to spin up on this to the point where I can give some feedback. But kubernetes is a new language to me.

I understand that this PR adds options for customization of dask workers. Does it also allow custom conda / pip packages to be installed? If so, how does that fit in to the various options discussed above?

@yuvipanda
Copy link
Collaborator Author

@rabernat I think it's a little removed from there yet I think.

For me, the most useful thing to know would be 'what would your ideal workflow look like?'

@yuvipanda
Copy link
Collaborator Author

@mrocklin upon more reflection, I wonder if:

  1. Require an entire PodSpec template be passed in to constructor, and this template can come from YAML / Code. We can modify this to enforce some invariants (around labels, for example), but mostly not.

Is actually not a bad idea at all. We can provide convenience methods on top of it (essentially what we have now in the KubeCluster constructor), but the core can be kept simple.

@mrocklin
Copy link
Member

Is actually not a bad idea at all. We can provide convenience methods on top of it (essentially what we have now in the KubeCluster constructor), but the core can be kept simple.

Having this as an option would make me happy. Does this stop us from also accepting keywords that layer dictionaries on top of this base spec for particularly common cases? Which keywords to accept becomes a complex question of course, I'm just curious if it's feasible to have the best of both worlds.

- Have a function, make_pod_spec that accepts a bunch of stuff
  and gives back a pod template
- This allows us to add creating pod templates from YAML, JSON,
  etc in the future
This is useful because not everyone's host has same kinda python
@yuvipanda
Copy link
Collaborator Author

@mrocklin ok, I now have the core KubeCluster object only accepting a pod template, and there's a convenience function for generating said template. We can easily add a convenience function for generating it from yaml as well.

LMK how this looks! I got all tests to pass, although now you've to explicitly pass a '--worker-image' to pytest to tell it what image to use (since my local python is very different from the python used in dask upstream images).

I think easy ways to improve here would be:

  1. Only image is required to make a podspec, we can read the image name from environment variable by default if it isn't given. We can also make other params default to reading from env variables, and populate this in our JupyterHub. This should address Configure cluster defaults from environment variables #28
  2. For a YAML loader, we can simply point to a path that has a YAML file. This file path can also default to a well known location, or be read from environment variable. This again can be set by admins.

@yuvipanda
Copy link
Collaborator Author

@mrocklin ok, I've added ability to read pod template info from YAML now.

This code could use a little bit more clarity around naming and what not, but hopefully this approach makes both of us happy? :)

@yuvipanda
Copy link
Collaborator Author

I also wonder if I should just move the extra_config merging code to its own function, since then it can be chained with both yaml and non-yaml methods. We could also move the overrides back to KubeCluster constructor.

self.worker_labels['component'] = 'dask-worker'
worker_pod_template.metadata.labels['dask.pydata.org/cluster-name'] = name
worker_pod_template.metadata.labels['app'] = 'dask'
worker_pod_template.metadata.labels['component'] = 'dask-worker'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a copy before mutating the given input?

Also, should we expect a Pod or a PodSpec? Do we want the user to specify a an ObjMeta?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should accept Pod objects than PodSpec, since that's what all the k8s docs generally specify. And having users be able to do annotations / labels is also quite useful.

self.worker_image = worker_image
self.worker_labels = (worker_labels or {}).copy()
self.threads_per_worker = threads_per_worker
self.env = dict(env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to we can continue supporting these kinds of common configurations by modifying the template in the _make_pod method. I don't know if this is desired, but it might be useful if there are parameters that we expect users to want to change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to move the extra_* stuff to the constructor, and just have a 'common configuration' thing as another classmethod.

@@ -55,27 +52,29 @@ class KubeCluster(object):
"""
def __init__(
self,
worker_pod_template,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to accept this as either a kubernetes library object, a dictionary, or a filename, and then normalize all options into a kubernetes object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made individual classmethods for these variants instead. I'm generally not a fan of doing polymorphism style stuff in Python constructors, since I thinks this is clearer.

pod_template = apiclient.deserialize(
_FakeResponse(yaml.safe_load(f)),
client.V1Pod
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to make this approach work before. A number of components of the kubernetes object were silently dropped. I had a bunch of hacks on top of this to get my use case to work. Those hacks were pretty fragile. https://github.com/mrocklin/daskernetes/blob/deploy/daskernetes/core.py#L338-L374

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular problem was my main blocker. I wasn't familiar enough with the kubernetes library to easily find a path out. Most other issues I'm able to handle

mrocklin and others added 5 commits February 2, 2018 16:59
This includes failures in the deserialize method that we saw when
deploying pangeo.pydata.org
Add failing test for make_pod_from_dict
This makes things much clearer - you are either using
python client objects and making an object yourself, or
using k8s style JSON / YAML specs with a dictionary.
@mrocklin
Copy link
Member

mrocklin commented Feb 5, 2018

Ah, I found the source of my snake_case issue. pod.to_dict() returns a dictionaries with snake_case rather than camelCase. I had wrongly assumed that this would provide dictionaries that we could round-trip with deserialize

@mrocklin mrocklin merged commit 30e1e13 into master Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants