Use annotations machinery for workers/priority/... #4347

mrocklin · 2020-12-10T19:44:52Z

Previously we had two systems to send per-task metadata like retries or
workers or priorities to the scheduler.

Older system with explicit workers= keywords and expand_foo functions
Newer system with annotations

The annotations system is nicer for a few reasons:

It's more generic
It's more consistent (there were some bugs in the expand foo
functions, especially when dealing with collections)
We ship values up on a per-layer basis rather than a per-key basis

This work-in-progress commit rips out the old system and uses the new system,
but it still missing a lot:

It only implements this for the Client.compute method.
We need to repeat this for persist, submit, and map
It doesn't handle the allow_other_workers -> loose_restrictions
conversion anywhere yet. (this will need to be added to the
scheduler)

cc @sjperkins if you have time I'd love your thoughts on this approach. Also, if you agree with this approach and want to take over this PR that would also be welcome :)

Previously we had two systems to send per-task metadata like retries or workers or priorities to the scheduler. 1. Older system with explicit workers= keywords and expand_foo functions 2. Newer system with annotations The annotations system is nicer for a few reasons: 1. It's more generic 2. It's more consistent (there were some bugs in the expand foo functions, especially when dealing with collections) 3. We ship values up on a per-layer basis rather than a per-key basis This work-in-progress commit rips out the old system and uses the new system, but it still missing a lot: 1. It only implements this for the Client.compute method. We need to repeat this for persist, submit, and map 2. It doesn't handle the allow_other_workers -> loose_restrictions conversion anywhere yet. (this will need to be added to the scheduler)

sjperkins · 2020-12-11T17:17:58Z

Firstly, I think placing the existing taxonomy into an annotation dictionary for scheduler transmission is the right way forward as it's a cleaner mechanism than having separate arguments to update-graph-hlg.

My initial reaction to global annotations (as described in #4306 (comment)):

with dask.annotate(foo="baz"):
    x.compute()

is that they shouldn't be supported as I've always thought of annotations as a fine-grained mechanism for describing a task. I think that "global " annotations should be discouraged. Perhaps my view is too narrow? If really desired the following pattern would also work:

with dask.annotate(foo="baz"):
   x = construct_graph()

x.compute()

However, the existing API should probably be supported for backwards compatibility (as is started in this PR)

client.compute(x, retries=3)

I may have time to look into this in more detail next week.

mrocklin · 2020-12-11T19:16:51Z

This PR doesn't have an opinion on global vs local annotations in the compute call. I suggested this in a previous issue, but so far all we're doing here is replacing the internal machinery.

mrocklin · 2021-01-04T17:05:57Z

@ian-r-rose if you have time to carry on this PR that would be welcome.

ian-r-rose · 2021-01-04T17:08:19Z

@ian-r-rose if you have time to carry on this PR that would be welcome.

Sounds good

dhirschfeld · 2021-01-04T23:55:28Z

Perhaps my view is too narrow?

One use case I'd like to enable is to keep track of different jobs running on the cluster:

with dask.annotate(job_name="weather-etl", job_id=uuid4().hex):
   x.compute()

I think that would require global annotations?

jrbourbeau · 2021-02-05T15:59:59Z

Closing as this work was superseded by #4406

mrocklin mentioned this pull request Dec 11, 2020

fixes client.compute not properly working with workers keywork argument and dask.arry #4350

Closed

mrocklin mentioned this pull request Dec 31, 2020

HttpError with no message halfway through large GS write workload fsspec/gcsfs#316

Closed

This was referenced Jan 6, 2021

Graph optimization loses annotations dask/dask#7036

Open

Unify annotations #4406

Merged

jrbourbeau closed this Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use annotations machinery for workers/priority/... #4347

Use annotations machinery for workers/priority/... #4347

mrocklin commented Dec 10, 2020

sjperkins commented Dec 11, 2020

mrocklin commented Dec 11, 2020

mrocklin commented Jan 4, 2021

ian-r-rose commented Jan 4, 2021

dhirschfeld commented Jan 4, 2021

jrbourbeau commented Feb 5, 2021

Use annotations machinery for workers/priority/... #4347

Use annotations machinery for workers/priority/... #4347

Conversation

mrocklin commented Dec 10, 2020

sjperkins commented Dec 11, 2020

mrocklin commented Dec 11, 2020

mrocklin commented Jan 4, 2021

ian-r-rose commented Jan 4, 2021

dhirschfeld commented Jan 4, 2021

jrbourbeau commented Feb 5, 2021