Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect dask.annotate during compute/persist calls #4306

Closed
mrocklin opened this issue Dec 3, 2020 · 5 comments
Closed

Respect dask.annotate during compute/persist calls #4306

mrocklin opened this issue Dec 3, 2020 · 5 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented Dec 3, 2020

Now that dask/dask#6889 and #4279 are in (thanks @sjperkins !) we can use the dask.annotate context manager to mark layers built within a context

with dask.annotate(foo="bar"):
    x = da.ones(10)

Some folks may also want to use this same context when calling compute/persist

with dask.annotate(foo="baz"):
    x.compute()

Some thoughts on how we might do this:

  • This would probably require that we check dask.config.get("annotations") inside Client._graph_to_futures and then send those annotations up to the scheduler.
  • We would probably also want to handle those annotations in the Scheduler.update_graph_hlg or Scheduler.update_graph.
  • We'll need to figure out the relative priority when the layer annotations and the compute-time annotations disagree. My guess is that compute time annotations should take precedence.

@sjperkins does this interest you?

@jrbourbeau
Copy link
Member

We'll need to figure out the relative priority when the layer annotations and the compute-time annotations disagree. My guess is that compute time annotations should take precedence.

+1 to having compute-time annotations taking precedence over construction-time annotations

@sjperkins
Copy link
Member

Some thoughts on how we might do this:

* This would probably require that we check `dask.config.get("annotations")` inside `Client._graph_to_futures`

Agreed

and then send those annotations up to the scheduler.

* We would probably also want to handle those annotations in the `Scheduler.update_graph_hlg` or `Scheduler.update_graph`.

An easy way to do this would be to update existing HLG layers which have their own annotation transmission mechanism.

A generic "apply annotations to all keys transmitted to the scheduler" mechanism smells like it may have some gremlins/inefficiencies that would need careful consideration.

* We'll need to figure out the relative priority when the layer annotations and the compute-time annotations disagree.  My guess is that compute time annotations should take precedence.

Agreed. With the first approach above, this simply involves overriding the existing layer annotations.

@sjperkins does this interest you?

If the first approach is acceptable I think I could achieve a quick win.

@mrocklin
Copy link
Member Author

mrocklin commented Dec 3, 2020

An easy way to do this would be to update existing HLG layers which have their own annotation transmission mechanism.

I see two possible problems with this:

  1. Mutating the layers may unintended side effects
  2. Operations like submit and map may not be covered here

@sjperkins
Copy link
Member

I think I'll take a raincheck on this one as I need to currently work with SchedulerPlugins and scheduling on my side.

I am interested in documenting the interaction between Annotations and SchedulerPlugins and have opened an issue to track this: dask/dask#6928

@jrbourbeau
Copy link
Member

Closing as this has been implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants