Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use annotations machinery for workers/priority/... #4347

Closed
wants to merge 1 commit into from

Conversation

mrocklin
Copy link
Member

Previously we had two systems to send per-task metadata like retries or
workers or priorities to the scheduler.

  1. Older system with explicit workers= keywords and expand_foo functions
  2. Newer system with annotations

The annotations system is nicer for a few reasons:

  1. It's more generic
  2. It's more consistent (there were some bugs in the expand foo
    functions, especially when dealing with collections)
  3. We ship values up on a per-layer basis rather than a per-key basis

This work-in-progress commit rips out the old system and uses the new system,
but it still missing a lot:

  1. It only implements this for the Client.compute method.
    We need to repeat this for persist, submit, and map
  2. It doesn't handle the allow_other_workers -> loose_restrictions
    conversion anywhere yet. (this will need to be added to the
    scheduler)

cc @sjperkins if you have time I'd love your thoughts on this approach. Also, if you agree with this approach and want to take over this PR that would also be welcome :)

Previously we had two systems to send per-task metadata like retries or
workers or priorities to the scheduler.

1.  Older system with explicit workers= keywords and expand_foo functions
2.  Newer system with annotations

The annotations system is nicer for a few reasons:

1.  It's more generic
2.  It's more consistent (there were some bugs in the expand foo
    functions, especially when dealing with collections)
3.  We ship values up on a per-layer basis rather than a per-key basis

This work-in-progress commit rips out the old system and uses the new system,
but it still missing a lot:

1.  It only implements this for the Client.compute method.
    We need to repeat this for persist, submit, and map
2.  It doesn't handle the allow_other_workers -> loose_restrictions
    conversion anywhere yet.  (this will need to be added to the
    scheduler)
@sjperkins
Copy link
Member

Firstly, I think placing the existing taxonomy into an annotation dictionary for scheduler transmission is the right way forward as it's a cleaner mechanism than having separate arguments to update-graph-hlg.

My initial reaction to global annotations (as described in #4306 (comment)):

with dask.annotate(foo="baz"):
    x.compute()

is that they shouldn't be supported as I've always thought of annotations as a fine-grained mechanism for describing a task. I think that "global " annotations should be discouraged. Perhaps my view is too narrow? If really desired the following pattern would also work:

with dask.annotate(foo="baz"):
   x = construct_graph()

x.compute()

However, the existing API should probably be supported for backwards compatibility (as is started in this PR)

client.compute(x, retries=3)

I may have time to look into this in more detail next week.

@mrocklin
Copy link
Member Author

This PR doesn't have an opinion on global vs local annotations in the compute call. I suggested this in a previous issue, but so far all we're doing here is replacing the internal machinery.

@mrocklin
Copy link
Member Author

mrocklin commented Jan 4, 2021

@ian-r-rose if you have time to carry on this PR that would be welcome.

@ian-r-rose
Copy link
Collaborator

@ian-r-rose if you have time to carry on this PR that would be welcome.

Sounds good

@dhirschfeld
Copy link
Contributor

Perhaps my view is too narrow?

One use case I'd like to enable is to keep track of different jobs running on the cluster:

with dask.annotate(job_name="weather-etl", job_id=uuid4().hex):
   x.compute()

I think that would require global annotations?

@jrbourbeau
Copy link
Member

Closing as this work was superseded by #4406

@jrbourbeau jrbourbeau closed this Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants