Drop support for cythonized scheduler #5685

fjetter · 2022-01-24T11:10:18Z

In a call last week we discussed dropping the support for cythonization of the scheduler code.

The cythonization project was an attempt to improve the scheduler performance. While it indeed delivered improved performance it was not as significant as originally hoped.

The cythonization annotations introduce a few subtle problems to maintenance and development

CI builds take longer
Annotations are not mypy / IDE friendly
The code became increasingly unreadable
Perfectly fine python code is not always compilable requiring us to deal with cython whether we want to or not
Development on the project stalled
There is no published / packaged build available
We do not have any benchmarking available and need to rely on few individuals to "guesstimate" whether code changes might hurt the cythonization

Considering that the project is currently stalled, the costs are ongoing and the value for users is very limited, I suggest dropping support for cythonization.
We may want to pick up cythonization at a later point in time for specific applications again, for instance as specialized c extensions.

@quasiben, @jakirkham, @jrbourbeau, @jcrist

xref Drop PyPy #5681

jakirkham · 2022-01-25T18:47:01Z

Could someone please explain what is meant by "stalled"? AIUI this is done. We could of course always find more things to do. Though if there are other things we would like this group to do, maybe we can spend some time discussing those.

Are there any metrics for build times? In particular CI runtime with/without? Given we are only compiling one file, wouldn't expect this to take very long (and that hadn't been my experience either).

Yes MyPy and Cython don't work well today. This was one of the concerns I raised when MyPy was being explored. Personally would still suggest not using this on a Cythonized file. Cython developers continue to work on typing that is more natural to Python though this likely will take time to do.

Can someone please share some examples of what is deemed unreadable along with an explanation of why that is found to be the case?

Same question with compilation issues.

In terms of building a package this could be done at this point. We held off initially as we didn't have CI coverage, which we since added.

We did some benchmarking here. There have been other attempts to explore this. The main challenge here is a proper benchmark likely involves a multinode cluster, which is a non-trivial ask for a regularly run benchmark. Curious what others think about benchmarking. This goes beyond Cython as we really need a better idea of runtime and the impact of changes (though this will add to CI runtime).

fjetter · 2022-01-26T10:25:31Z

Could someone please explain what is meant by "stalled"? AIUI this is done. We could of course always find more things to do. Though if there are other things we would like this group to do, maybe we can spend some time discussing those.

We haven't done anything on this front in months. I doubt it is generating value for many people but I might be wrong, of course.
I am also worried that much of the original tuning is effectively reverted by now since we do not actively look into it and there have been many changes to the scheduler since.

It's probably fair to say that an initial increment in this effort is done but what comes next? The discussions around scheduler performance shifted and we haven't moved forward on this in a long time. If we indeed want to support it we should consider next steps. If not, we should consider dropping it in favour of ease of development and reduced maintenance.

Are there any metrics for build times? In particular CI runtime with/without? Given we are only compiling one file, wouldn't expect this to take very long (and that hadn't been my experience either).

We don't have any proper statistics for this. The only thing I can say is that installation + build takes about 1-2min looking at a few recent builds. That's about 3-10% of runtime (given that a job runs for about 20-30min). That fluctuates, of course. I can't say if it impacts test runtime itself. I wouldn't want this argument to get derailed due to this, that's just one more small cut.

Can someone please share some examples of what is deemed unreadable along with an explanation of why that is found to be the case?

I think this is a bit hard to produce. At this point it is actually very hard to distinguish what kind of annotations are in there for mypy reasons and which are there because cython requires it. I personally cannot make a strong case for this but it was raised as a concern.

Same question with compilation issues.

Also hard to pin down exactly. Searching the PRs we can at least filter out the ones that mention cython and quite a few mention some subtle problems impacting the code. https://github.com/dask/distributed/pulls?q=is%3Apr+cython
Most of these are minor issues like missing type annotations. Others require a minor rewrite in how the code is written.

In terms of building a package this could be done at this point. We held off initially as we didn't have CI coverage, which we since added.

I think this is coupled strongly to the outcome of this issue. If we continue to keep this code and want to support this moving forward, we have to publish this to make it accessible to users. If we don't the value of keeping it is too small imo.

We did some benchmarking here. There have been other attempts to explore this. The main challenge here is a proper benchmark likely involves a multinode cluster, which is a non-trivial ask for a regularly run benchmark. Curious what others think about benchmarking. This goes beyond Cython as we really need a better idea of runtime and the impact of changes (though this will add to CI runtime).

I personally think that these changes have to be monitored automatically. Many of the cythonization changes can break easily, for instance if cython falls back to python code due to typing issues. We don't have proper visibility into this and obviously not the bandwidth to check this manually for every change.

jakirkham · 2022-02-03T00:31:43Z

We haven't done anything on this front in months. I doubt it is generating value for many people but I might be wrong, of course. I am also worried that much of the original tuning is effectively reverted by now since we do not actively look into it and there have been many changes to the scheduler since.

There are various parts of the codebase that don't see attention for a period of time. This doesn't say much about their usefulness. Serializing NumPy arrays as an arbitrary example (the last change occurred there almost 1yr ago). Despite the lack of attention many people happily use this (whether they know it or not). Idk what can be determined about a piece of code's value using this metric.

That said, taking a step back, I think it is worth looking at how the optimization work has progressed. Initially we thought the way to improve things was to use HLGs to build up more complex task graphs with fewer nodes, ship these to the Scheduler, and then unpack and execute them on Workers. Though this would move more of the burden to the Scheduler, which was already having some issues. Hence we spent time optimizing the Scheduler. After a fair amount of optimization, the remaining pain points were communication and serialization. On the communication front there have been a number of improvements. The most recent of which was adding support with asyncio with uvloop. More gains could likely be gotten by patching CPython. Serialization has seen a number of improvements, most notably removing extra unneeded passes over the data ( #4531 ). There are other improvements that could be made like ( #4699 ) or ( #4923 ), but perhaps this gets handle along with other serialization work ( #5581 ). To summarize, we started down an optimization path, which led us to some different issues. Now that we have done some work on these other issues, it seems worthwhile to revisit the big picture and see where things stand.

It's probably fair to say that an initial increment in this effort is done but what comes next? The discussions around scheduler performance shifted and we haven't moved forward on this in a long time. If we indeed want to support it we should consider next steps.

Again I'm not sure this is how we should be evaluating code's utility, but ignoring that.

A next step would be separating out SchedulerState into its own object so Scheduler doesn't inherit from it ( #5176 ). This includes the graph state and operations on it where Cythonization has been more beneficial. Ideally moving SchedulerState into a separate file along with relevant graph objects Task*, *State, etc. would also make the Cython/Python break cleaner.

Beyond that some of the objects in the Scheduler are opaque. This affects Python and Cython code in terms of general maintainability and performance. It also shows up in communication and serialization. Namely we use dicts and lists, but there is little clarity as to what should be in them, why, or how that information should be used. Ideally we would move to something that makes these objects more explicit. Personally would push for dataclasses, which are pretty lightweight, easy to read, autocompletion friendly, and optimizable with Cython ( cython/cython#3400 ).

Outside of this am hoping the feedback you and others have provided here could be translated into a series of scoped issues that we could then spend time working on.

Admittedly a lot of this is cleanup work that requires a modest effort and will likely yield some performance improvements along the way. There wasn't much appetite for this clean up work before as things were largely driven by improving performance. Now that there is more appetite for cleanup work, it seems reasonable to spend some time on this.

We don't have any proper statistics for this. The only thing I can say is that installation + build takes about 1-2min looking at a few recent builds. That's about 3-10% of runtime (given that a job runs for about 20-30min). That fluctuates, of course. I can't say if it impacts test runtime itself. I wouldn't want this argument to get derailed due to this, that's just one more small cut.

Not to derail the argument, but I do think we should have concrete information here if we are going to make decisions based on it.

It seems a first step here would be measuring the runtime of different steps in CI and identifying the significant time consuming steps. Ideally this would be scoped into an issue someone could work on to address this concern.

I think this is a bit hard to produce. At this point it is actually very hard to distinguish what kind of annotations are in there for mypy reasons and which are there because cython requires it. I personally cannot make a strong case for this but it was raised as a concern.

Again it would help to have concrete examples to discuss. If there are particular issues, someone can work on addressing those. Perhaps whoever voiced this concern can share some examples?

Also hard to pin down exactly. Searching the PRs we can at least filter out the ones that mention cython and quite a few mention some subtle problems impacting the code. https://github.com/dask/distributed/pulls?q=is%3Apr+cython Most of these are minor issues like missing type annotations. Others require a minor rewrite in how the code is written.

Sorry to be a broken record here, but again concrete examples would help.

That said, went through the search results. Here are the first 3 PRs that come up. This goes back to October 2021. So a few months. Numbered by most recent PR to oldest.

The only mention of Cython is this comment ( Improve test coverage #5655 (comment) ), which isn't actually discussing Cython in the context of Distributed.
There is a brief discussion of what Cython supports ( Handle errors on individual workers in run/broadcast #5590 (comment) ), which seems to have already been handled prior.
We had a brief discussion ( Long running occupancy #5395 (comment) ) about a function that was moved.

Before that are the MyPy PRs, which we have already discussed above so will skip that.

Nothing is really sticking out to me, but feel free to point out anything I've missed.

I think this is coupled strongly to the outcome of this issue. If we continue to keep this code and want to support this moving forward, we have to publish this to make it accessible to users. If we don't the value of keeping it is too small imo.

Agree that publishing packages seems like a reasonable ask. This is being tracked in issue ( #4442 ). Maybe we can discuss there more what we would like to see.

I personally think that these changes have to be monitored automatically. Many of the cythonization changes can break easily, for instance if cython falls back to python code due to typing issues. We don't have proper visibility into this and obviously not the bandwidth to check this manually for every change.

Honestly this is a bit separate from Cythonization itself. Dask really needs regularly run benchmarks regardless.

There has been work in dask-benchmarks on this previously, but IDK how well that has stayed up-to-date.

When developing the Cythonized Scheduler, we used these benchmarks.

RAPIDS also has GPU-BDB.

There are two challenges:

Coming up with the right benchmarks
Setting up & maintaining infrastructure that regularly does this

1 ends up being harder than a project like say NumPy or Pandas as Dask does a lot under-the-hood when performing any kind of computation. Building a graph, optimizing the graph, shipping that to the Scheduler, distributing that workload to Workers, checking on the Workers status and progress, managing memory, etc. Then there are questions about the size of computation, chunks, and cluster it runs on. Trying to get the right benchmark that measures the right thing is tricky.

2 comes with its own set of challenges. First where does this run? Perhaps someone can run on a desktop they have lying around. This tends towards a bus factor of 1 though. It could be run in the cloud, but then this becomes a question of who pays for this. Then comes the question of what the machine needs in order to run in terms of hardware, OS, drivers, etc. Multinode comes up a lot for workloads that scale and seems desirable for benchmarking, but again difficult to get access to (and possibly expensive!). Finally there is a question of frequency. While it would be nice to run on every PR, that is likely impractical. Similarly every commit may prove tricky depending on what our expectations are. Running daily may be more doable.

Anyways if we want to discuss this further, which I'm supportive of, would suggest raising a new issue as it seems like a whole topic unto its own.

If not, we should consider dropping it in favour of ease of development and reduced maintenance.

Taking this out of order, but I hope it makes sense why.

I think we need a better handle on what the actual issues are. It seems they are captured in a general way, but it would help to find more concrete examples. ATM it doesn't appear we have them. If they are coming up regularly though, it shouldn't be too difficult to share examples here.

Once we have those examples, it may be easier to make some decisions based on them. Certainly one decision might be to drop. Another may simply be to fix the issues.

mrocklin · 2022-03-30T17:56:15Z

Bringing this back up. We have a group meeting next week. In that meeting I plan to advocate for reverting the Cythonization work, and hopefully (if people feel up for it) continuing that work in a separate branch.

In its current state Cythonization doesn't provide much value to Dask users, but does significantly complicate the codebase, which makes it harder to debug and work on. It has been in this state for a long while, and we don't seem to have a plan to get it out of this state without causing even further change to main (which I don't think is appropriate). With that in mind, I propose that we ...

Merge SchedulerState back into Scheduler
Drop Cython annotations
Drop Cython CI builds and build infrastructure

I would still love to see experimentation happen in this space. I think that we would probably just want that experimentation to happen in either a separate branch, or in a way that made main more sensible and with less churn.

I'd love to engage in async conversation here a bit, see if we can nail down the parameters of this decision, and then make this decision next week.

sjperkins · 2022-04-04T11:29:27Z

+1 on this -- I have a test suite failing on Python 3.8 because inspect.iscoroutinefunction doesn't recognise a cythonised async function as async.

https://github.com/dask/distributed/runs/5812749893?check_suite_focus=true

fjetter · 2022-04-07T15:54:35Z

In todays developer meeting we reached the decision to move forward with this and drop cython.

Briefly summarized, our current prioritization does not allow us to focus on developing this further and the maintenance introduces overhead. While there are benefits for certain workloads to run on the cythonized code, we believe that these benefits do not outweigh the cost for maintenance.
Our current priority is to improve stability and reduce complexity. We hope that removing Cython from our code base now will help us to achieve this goal.

We may pick up Cython at a later point in time again if we believe it will benefit the project.

martindurant · 2022-04-08T18:54:07Z

@fjetter , @jakirkham , any appetite to take this on? I can so it, if not.

quasiben · 2022-05-04T18:44:23Z

Anything left for uncythonizing or can we close this ?

martindurant · 2022-05-04T18:52:35Z

Maybe small remaining comments and annotations, but certainly nothing blocking.

fjetter mentioned this issue Jan 24, 2022

Drop PyPy compatibility? #5681

Closed

fjetter mentioned this issue Feb 2, 2022

Cythonic SchedulerState (WIP) #5176

Closed

3 tasks

gjoseph92 mentioned this issue Feb 24, 2022

Support dumping cluster state to URL #5863

Merged

3 tasks

fjetter mentioned this issue Apr 8, 2022

Revert cythonization #6086

Closed

jakirkham mentioned this issue Apr 12, 2022

Start uncythonization #6104

Merged

3 tasks

martindurant closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop support for cythonized scheduler #5685

Drop support for cythonized scheduler #5685

fjetter commented Jan 24, 2022

jakirkham commented Jan 25, 2022

fjetter commented Jan 26, 2022

jakirkham commented Feb 3, 2022

mrocklin commented Mar 30, 2022

sjperkins commented Apr 4, 2022

fjetter commented Apr 7, 2022

martindurant commented Apr 8, 2022

quasiben commented May 4, 2022

martindurant commented May 4, 2022

Drop support for cythonized scheduler #5685

Drop support for cythonized scheduler #5685

Comments

fjetter commented Jan 24, 2022

jakirkham commented Jan 25, 2022

fjetter commented Jan 26, 2022

jakirkham commented Feb 3, 2022

mrocklin commented Mar 30, 2022

sjperkins commented Apr 4, 2022

fjetter commented Apr 7, 2022

martindurant commented Apr 8, 2022

quasiben commented May 4, 2022

martindurant commented May 4, 2022