Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate Scheduler with Cython, PyPy, or C #854

Closed
mrocklin opened this issue Feb 3, 2017 · 105 comments
Closed

Accelerate Scheduler with Cython, PyPy, or C #854

mrocklin opened this issue Feb 3, 2017 · 105 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented Feb 3, 2017

We are sometimes bound by the administrative of the distributed scheduler. The scheduler is Pure-Python, and a bundle of core data structures (lists, sets, dicts). It generally has an overhead of a few hundred microseconds per task. When graphs become large (hundreds of thousands) this overhead can become troublesome.

There are a few potential solutions:

  1. Use Cython in a few places
  2. Run the entire scheduler in PyPy (workers, clients, and user code can still be in CPython)
  3. Rewrite everything in C/Go/Julia/whatever

Generally efforts here have to be balanced with the fact that the scheduler will continue to change, and we're likely to continue writing it in Python, so any performance improvement would have the extra constraint that it can't add significant development inertia or friction.

Here are a couple of cProfile-able scripts that stress scheduler performance: https://gist.github.com/mrocklin/eb9ca64813f98946896ec646f0e4a43b

@mrocklin
Copy link
Member Author

mrocklin commented Feb 3, 2017

Here is another, more real-world benchmark: https://gist.github.com/48b7c4b610db63b2ee816bd387b5a328

Though I plan to try to clean up performance on this one up a bit in pure python first.

@kszucs
Copy link
Contributor

kszucs commented Feb 3, 2017

Any experience running with PyPy? How much is the difference?

I think it's a bit complicated to run the scheduler on pypy and the workers with cpython (if you want numpy or pandas).

According to @GaelVaroquaux 's testimonial on Cython:

You guys rock! In scikit-learn, we have decided early on to do Cython, rather than C or C++. That decision has been a clear win because the code is way more maintainable. We have had to convince new contributors that Cython was better for them, but the readability of the code, and the capacity to support multiple Python versions, was worth it.

With Cython it's easier to achieve c-like performance without leaving the python origins. I had great experience with it too.

@mrocklin
Copy link
Member Author

mrocklin commented Feb 3, 2017

My perspective is the following:

The advantage of PyPy is that we would maintain a single codebase. We would also speedup everything for free (including tornado). I'm not normally excited about PyPy. I think that dask's distributed scheduler is an interesting exception. PyPy seems to be fairly common among web projects. The distributed scheduler looks much more like a web project than a data science project.

The advantage of Cython is that users can install and run it normally using their normal tool chain. Also the community that works on Dask has more experience with Cython than with PyPy. The disadvantage is that we will need to maintain two copies of several functions to support non-Cython users, and things will grow out of date. I do not intend to make Cython a hard dependency.

The advantage of C is that it would force us to reconsider our data structures.

@Kobzol
Copy link
Contributor

Kobzol commented Feb 3, 2017

I'm using PyPy for client, workers and scheduler and it works fine. I also tried to run the scheduler on PyPy and client with CPython, that worked too.
The only problems arised when workers and client didn't both use PyPy or CPython, because the serialization/deserialization wasn't compatible (which makes sense).

I think that accelerating the scheduler in this way is a good idea, just wanted to state that at least with PyPy you can try it now right away, without any changes to distributed itself :-)

@hussainsultan
Copy link
Contributor

cc: @marshyski this might of interest to you from Go perspective.

@kszucs
Copy link
Contributor

kszucs commented Feb 4, 2017

@mrocklin

The disadvantage is that we will need to maintain two copies of several functions to support non-Cython users

Did You mean non-CPython users? It doesn't seem impossible to keep compatible with pypy without code duplication.

I guess Numba is out-of-scope - the scheduler depends and possibly will depend on hashtable structures, right?

@mrocklin
Copy link
Member Author

mrocklin commented Feb 4, 2017

No, I mean non-Cython users. I don't intend to make Dask depend on Cython near term.

@mrocklin
Copy link
Member Author

mrocklin commented Feb 4, 2017

Numba is unlikely to accelerate the scheduler code.

@GaelVaroquaux
Copy link

GaelVaroquaux commented Feb 4, 2017 via email

@mrocklin
Copy link
Member Author

mrocklin commented Feb 7, 2017

cc @anton-malakhov

@mrocklin
Copy link
Member Author

Quick timing update showing PyPy vs CPython for creating typical dask graphs. We find that PyPy is in the expected 2-5x faster range we've seen with Cython in the past:

PyPy

>>>> import time
>>>> start = time.time(); d = {('x', i): (apply, lambda x: x + 1, [1, 2, 3, i],\
 {}) for i in range(100000)}; end = time.time(); print(end - start)
0.11324095726

CPython

In [1]: import time

In [2]: start = time.time(); d = {('x', i): (apply, lambda x: x + 1, [1, 2, 3, i], {}) for i in range(100000)}; end = time.time(); print(end - start)
0.357743024826

Given what happens to turn up as a bottleneck. I can imagine wanting to cythonize core parts of dask.array as well (like top).

@GaelVaroquaux
Copy link

GaelVaroquaux commented Feb 12, 2017 via email

@mrocklin
Copy link
Member Author

Usually that's because there are some type information that should be added to make Cython faster.

I'm aware. To be clear, my previous comment was comparing PyPy to CPython, not Cython. We observe similar speedups that we've seen when accelerating data-structure-dominated Python code with Cython in the past, notably 2-5x. In my experience 100x speedups only occur in Cython when accelerating numeric code, where Python's dynamic dispatch is a larger fraction of the cost.

@GaelVaroquaux
Copy link

OK, I had misunderstood your comment.

@honnibal
Copy link

I'm not sure what you mean by "making Dask depend on Cython". Cython would only be a build-time dependency --- you'd ship either the generated C files via PyPi, Conda, etc, or the user would require a C compiler. It's similar to writing a C extension, just in a better language.

@mrocklin
Copy link
Member Author

@honnibal of course you're correct. I should have said something like "making the dask development process depend on Cython and the dask source installation process depend on a C compiler" both of which add a non-trivial cost.

@jkterry1
Copy link

jkterry1 commented Nov 1, 2017

@mrocklin

Couldn't you integrate the cython building into travis before shipping to PyPI, and have a dask and dask-c version in PyPI? That seems like it has all the advantages and none of the costs.

Also, you're completely right that Cython won't be faster (or only a tiny bit than PyPy for your scheduler), but my argument against relying on it is that PyPy isn't production ready for and/or doesn't support a lot of important things, including a lot of the data science ecosystem that people would use Dask with. Cython already has complete compatibility with everything and is used in massive deployments by Google etc.

I also have a related question to all this, which is the reason I stumbled across this thread in the first place:
I'm trying to have dask compute a giant custom DAG of small numeric operators (something cython will give the 100x improvements in). How could I best implement this with dask?

@mrocklin
Copy link
Member Author

mrocklin commented Nov 1, 2017

Couldn't you integrate the cython building into travis before shipping to PyPI, and have a dask and dask-c version in PyPI?

Yes.

That seems like it has all the advantages and none of the costs.

This adds a significant cost in development and build maintenance.

Also, you're completely right that Cython won't be faster (or only a tiny bit than PyPy for your scheduler), but my argument against relying on it is that PyPy isn't production ready for and/or doesn't support a lot of important things, including a lot of the data science ecosystem that people would use Dask with. Cython already has complete compatibility with everything and is used in massive deployments by Google etc.

It is very hard (and often incorrect) to make claims about one being faster or slower than the other generally. Things are more complex than that.

I also have a related question to all this, which is the reason I stumbled across this thread in the first place: I'm trying to have dask compute a giant custom DAG of small numeric operators (something cython will give the 100x improvements in). How could I best implement this with dask?

I'm going to claim that this is unrelated. Please open a separate issue if you have a bug or ask a question on stack overflow (preferably with a minimal example) if you have a usage question.

@honnibal
Copy link

honnibal commented Nov 1, 2017

@justinkterry You won't really see a speed benefit from Cython unless you plan out your data structures in C. That's not nearly as hard as people suggest, and I think it actually makes code better, not worse. But it does mean maintaining a separate dask-c fork is really costly.

As far as the Travis build process goes: Yeah, that does work. But the effort of automating the artifact release is really a lot. You have to use both Travis and Appveyor, and Travis's OSX stuff is not very nice, because the problem is hard. I'm also not sure Travis will stay so free for so long. I suspect they're losing a tonne of money.

In general the effort of shipping a Python C extension is really quite a lot. Sometimes I feel like it's harder to build, package and ship my NLP and ML libraries than it is to write them.

If it included C extensions, a library like Dask would have a build matrix with the following dimensions:

  • OS: Windows, Linux, OSX. There's now also tonnes of less standard Linuxes, because of Docker
  • Compiler: GCC, CLang, MinGW, MSVC (various), ICC
  • Python version: 2.7, 3.5, 3.6, 3.7
  • Installation: pip, conda, pip system installation, pip local directory
  • Artifact type: sdist, wheel, build repository
  • Architecture: 32 bit, 64 bit

That's over 1,000 combinations, so you can't test the matrix exhaustively. Building and shipping wheels for all combinations is also really difficult, so a lot of users will have to source install. This means that a lot of things that shouldn't matter do. For instance, the peak memory usage might spike during compilation for some compilers, bringing down small nodes (obviously an important use-case for Dask!). This might happen on some platforms, but not others --- and the breakage might be introduced in a point release when Dask upgraded the version of Cython used to generate the code.

Is it worth it? Well, for me the choice is between writing extension and going off and doing something completely different. My libraries couldn't exist in pure Python. But for a very marginal benefit, I think you'd rather be shipping a pure Python library.

Btw, I also think PyPy might not be that helpful? The workers would have to be running CPython, right? Most tasks you want to schedule with Dask will run poorly on PyPy.

@jkterry1
Copy link

jkterry1 commented Nov 1, 2017

@honnibal Thank you very much for your detailed explanation; that actually helps a lot. What would you recommend doing if I want to call a bunch of functions in dask that would be highly accelerated by C? Just package each one with cython and call it from the script using dask? My only concern with doing that is the time it takes to go between python and C, because it's a very large number of very short functions.

@mrocklin
Copy link
Member Author

mrocklin commented Nov 1, 2017

What would you recommend doing if I want to call a bunch of functions in dask that would be highly accelerated by C?

@justinkterry I recommend raising a question on Stack Overflow using the #dask tag.

@fijal
Copy link

fijal commented Nov 11, 2017

Hi Everyone.

So me & Matt did some benchmarks, looked at pypy and here are the takeaways:

  • PyPy gives about 40% speedup for free
  • The remaining time is spent, predominantly:
    • 15% doing bytearray += stuff, which relies on refcounting to be fast.
    • 10% looking up stuff in dictionaries. The real figure is probably higher as it would show up in the GC. Using objects here instead of small dicts with known keys would speed things up considerably
    • 25% of time in the GC - likely mostly the stuff above, but also there is quite a bit of list copies and resizing

I think we can get another 2x speedup from PyPy with moderate effort. I can't promise I'll find time immediately, but if someone pesters me at some stage in the near to mid future, I can have a look.

Cheers,
fijal

@mrocklin
Copy link
Member Author

This provides some motivation to arrange per-task information into objects. It is currently spread across ~20 or so dictionaries. This would have the extra advantage of maybe being clearer to new developers (although this is subjective). We would also have to see what affect this would have on CPython performance.

A rewrite of this size is feasible, but is also something that we would want to discuss heavily before performing. Presumably there are a number of other improvements that we might want to implement at the same time.

I can't promise I'll find time immediately, but if someone pesters me at some stage in the near to mid future, I can have a look.

My guess is that your time would be better spent by providing us with advice on how best to use PyPy during this process. My guess is that it would be unpleasant for anyone not already familiar with Dask's task scheduler to actually perform this rewrite.

@njsmith
Copy link

njsmith commented Nov 11, 2017

@mrocklin as a general note, if you're thinking about moving things from dicts to objects, then the attrs library is really excellent

@fijal
Copy link

fijal commented Nov 13, 2017

So on the plus side, I added some small PyPy improvements to be not as bad when handling bytearray (and it has nothing to do with the handling of refcounts). Where are the bytearrays constructed? It might be better (for Cpython too) to create a list and use b"".join instead of having a gigantic bytearrays.

As for how to get some basics. I do the following:

  • run the program with PYPYLOG=jit-summary:- pypy program. That creates a short summary. What to look for:

    • number of aborts. If there are a lot of aborts, we need to look where. PYPYLOG=jit:log would create a massive log, where ABORT strings can be found. ABORT because trace is too long is normal - don't worry. ABORT because quasi immutable is forced is bad. I looked through the traceback and found is from calling a function. Then matt told me that function comes from cloudpickle, so presumably cloudpickle modifies some function parameters that are supposed to not be modified much
    • tracing time and backend time. This gives you some indication of warmup - if tracing + backend is a significant part of total time, either run for longer or pypy is struggling to warm up
  • I run PYPYLOG=log pypy program and then run PYTHONPATH=~/pypy python ~/pypy/rpython/tool/logparser.py print-summary - which gives me some basics: 8% of the time in GC (acceptable), 1% JIT tracing (good), the rest execution

  • python -m vmprof --web would upload the vmprof run.

At this stage, I think what we need to do is to take the core functions and make smaller benchmarks, otherwise it's a touch hard to do anything.

My hunch is that a lot of dicts and forest-like code of a few functions makes it hard for the JIT to make sense of it and as such too slow, but it's just a hunch for now

@pitrou
Copy link
Member

pitrou commented Nov 13, 2017

I'm not sure we want to go too deep into PyPy-specific tuning. The idea of converting the forest-of-dicts approach to a per-task object scheme sounds reasonable on the principle, though I'm not sure how much it would speed things up. We also don't want to risk making CPython slower by accident, as it is our primary platform (and our users' as well).

@fijal
Copy link

fijal commented Nov 13, 2017

Right, I'm pretty sure pypy-specific tuning is a terrible idea. It's also rather unlikely to make a measurable difference on CPython. Measuring on PyPy though DOES make sense (especially that it runs almost 2x faster in the first place). Do you have benchmarks running on CPython all the time? Because if not, then "making it slower by accident" is a completely moot point.

@pitrou
Copy link
Member

pitrou commented Nov 13, 2017

We do have a benchmarks suite (and some of us have individual benchmarks they run on a casual basis), but unfortunately we haven't automated its running (yet?).

@pitrou
Copy link
Member

pitrou commented Nov 13, 2017

Where are the bytearrays constructed? It might be better (for Cpython too) to create a list and use b"".join instead of having a gigantic bytearrays.

If I'm guessing correctly, it is in Tornado and should be fixed by tornadoweb/tornado#2169

@fijal
Copy link

fijal commented Nov 13, 2017

replacing OrderedDict with dict helps a bit on PyPy. I presume we can do that on PyPy always and CPython >= 3.6? Should help there too

@mrocklin
Copy link
Member Author

mrocklin commented May 5, 2020

In all benchmarks, 7-15% of time is spent in dask.sizeof. Seems like a lot. Here are some flamegraphs:

cc @fjetter nothing to do here, but I wanted you to be aware.

It looks like this is happening whenever the scheduler writes a message. We check if the message is large enough that we should serialize it in a separate thread. In the case of the scheduler all messages should already be pre-serialized, so this check should probably be skipped. This could be solved by passing through some offload= keyword or something similar. Although doing this in a way that respected Comms that didn't offload (like inproc) might require some care.

For the other components I'm still curious what within those functions is slow. Is it data structure access? Manipulating Task objects? It's hard to dive more deeply to figure out what about CPython itself is slow in particular.

@mrocklin
Copy link
Member Author

I wanted to check in here and bump this a bit. Two comments.

  1. It seems like a task that we can do right now is come up with a benchmark suite that stresses the client/scheduler in a variety of ways. I'll list a few here

    • roundtrip time for a single task
    • bandwidth of many embarrassingly parallel tasks
    • fully sequential workloads
    • random graphs that are both sparsely and densely connected
    • dataframe shuffle
    • array rechunking with x.rechunk((1, 1000)).rechunk((1000, 1))

    Last time we spoke it sounded like this was something that the folks at NVIDIA felt comfortable contributing.

  2. The results by @shwina with Cythonization are very cool. I would be curious to know some of the following:

    • Are the speedups similar if we're not running cProfile, but just timing the results (sometimes cprofile can make Pure Python code a bit slower than it would otherwise be
    • Are there things that we can do in the scheduler/client to make Cythonizing more effective? If so, what?
    • What would our build process look like with a bit of Cython code?

@quasiben
Copy link
Member

quasiben commented Sep 8, 2020

@Kobzol did you ever end up profiling things with perf or did you only use py-spy in the end ? No worries if you didn't do a perf run

@Kobzol
Copy link
Contributor

Kobzol commented Sep 8, 2020

I didn't use perf on Dask itself, just py-spy.

@mrocklin
Copy link
Member Author

I have some questions about Cython.

In @shwina 's work here he Cythonized the TaskState and other classes in the Scheduler

cdef public:
# === General description ===
cdef object actor
# Key name
cdef object key
# Key prefix (see key_split())
cdef object prefix
# How to run the task (None if pure data)
cdef object run_spec
# Alive dependents and dependencies
cdef object dependencies
cdef object dependents
# Compute priority
cdef object priority
# Restrictions
cdef object host_restrictions
cdef object worker_restrictions
cdef object resource_restrictions
cdef object loose_restrictions
# === Task state ===
cdef object _state
# Whether some dependencies were forgotten
cdef object has_lost_dependencies
# If in 'waiting' state, which tasks need to complete
# before we can run
cdef object waiting_on
# If in 'waiting' or 'processing' state, which tasks needs us
# to complete before they can run
cdef object waiters
# In in 'processing' state, which worker we are processing on
cdef object processing_on
# If in 'memory' state, Which workers have us
cdef object who_has
# Which clients want us
cdef object who_wants
cdef object exception
cdef object traceback
cdef object exception_blame
cdef object suspicious
cdef object retries
cdef object nbytes
cdef object type
cdef object group_key
cdef object group

This yielded only modest speedup.

I'm curious if it is possible to take this further, and modify the types in the class itself away from object type into something more compound, like the following:

dependencies: Dict[str, TaskState]

More specifically, some technical questions:

Does Cython have the necessary infrastructure to understand compound types like this? Do we instead need to declare types on the various methods like Scheduler.transition_*?

It looks like the previous effort didn't attempt to Cythonize the Scheduler.transition_* methods. This might be a good place to start with future efforts?

cc @jakirkham who I think might be able to answer the Cython questions above and has, if I recall correctly, recent experience doing this with UCX-Py. Also cc @quasiben who has been doing profiling here recently.

@scoder
Copy link

scoder commented Oct 26, 2020

You can use the dict type, but there is not currently dedicated support for containers with typed values like Dict[str, TaskState]. One of the issues is tracking usages of the "typing" module, another is integrating container item types into the type system. Both probably aren't difficult to implement (there is precedence for both, e.g. C++ template support), but it hasn't been implemented yet.

Usually, people type the variables that they assign the results to. Or use type casts. But that introduces either a requirement for Cython syntax or some runtime overhead in pure Python (due to a function call to cython.cast()).

Without looking at how the transition_*() methods are used, I can say that reducing call overhead is a very worthwhile goal for function/class heavy code. Nothing you can do in Python comes close to a C call.

@kkraus14
Copy link
Member

Given the dict is used with a pre-defined set of keys, I think we could define a big if, elif, elif, elif, etc. block which I believe Cython will optimize into a switch case for us. Would look a bit gross code-wise and be a bit unintuitive from a Python perspective.

@scoder
Copy link

scoder commented Oct 26, 2020

which I believe Cython will optimize into a switch case for us

switch doesn't work with string values. And a Python dict lookup with a str (especially a literal) is actually plenty fast. I doubt that this is a bottleneck here.

@mrocklin
Copy link
Member Author

Usually, people type the variables that they assign the results to. Or use type casts. But that introduces either a requirement for Cython syntax or some runtime overhead in pure Python (due to a function call to cython.cast()).

Thanks for the response @scoder . I think that we're becoming more comfortable with switching the entire file to Cython. So what I'm hearing is that we'll do something like the following:

cdef class TaskState
    dependencies: dict

cdef transition_foo_bar(self, ts: TaskState):
    key: str
    value: TaskState
    for key, value in ts.dependencies.items():
        ...

This is more c-like in that we're declaring up-front the types of various variables used within a function. Am I right in understanding that Cython will use these type hints effectively when unpacking key, value in the for loop, or do we need to do more here?

@jakirkham
Copy link
Member

I'm curious if it is possible to take this further, and modify the types in the class itself away from object type into something more compound, like the following:

dependencies: Dict[str, TaskState]

If we are comfortable moving to more typical Cython syntax, are we also comfortable making use of C++ objects in Cython? For example we could do things like this...

from libcpp.map cimport map
from libcpp.string cimport string

cdef class Obj:
    cdef map[string, int] data

Asking as this would allow us to make the kind of optimizations referred to above.

@mrocklin
Copy link
Member Author

I would be inclined to do this incrementally and see what is needed. My guess is that there will be value in the attributes of a TaskState object being easy to manipulate in Python as well. This will probably be useful when we engage the networking side of the scheduler, or the Bokeh dashboard.

To me the following somewhat incremental path seems good:

  1. Cythonize the entire scheduler.py file, without using C++
  2. Iterate on that for a while and see how fast we can get while collecting all of the low hanging fruit
  3. If this is fast enough then we stop here.
  4. Split the scheduler.pyx file out into a state machine part and a networking part, which gets converted back into pure Python. In doing so we figure out the right split in the scheduler.
  5. Now that the computational parts are well separated and there is a nice protocol boundary we have a bit more freedom to play with different technologies like C++, Rust, ...

@pitrou
Copy link
Member

pitrou commented Oct 26, 2020

I'm skeptical using a C++ map would bring anything. First, you probably want an unordered_map (which is a hash table rather than a tree as in map). Second, Python dicts are quite optimized as far as hash tables go (for example, the hash value of a string is interned and needn't be recomputed, strings can often be compared by pointer..).

@jakirkham
Copy link
Member

Sure the code above was intended to provide a sample. Not the optimal solution necessarily.

@pitrou
Copy link
Member

pitrou commented Nov 4, 2020

Slightly orthogonal, but I think the biggest potential speedup for the scheduler would come from vectorizing scheduler operations. That is, instead of having dicts and sets that are iterated on, find a way to express the scheduling algorithm in terms of vector/matrix operations, and use Numpy to accelerate them. Regardless of the implementation language, implementing set operations in terms of hash table operations will always be costly.

(this has several implications, such as having to identify tasks and other entities by integer ids rather than arbitrary dict keys)

@fijal
Copy link

fijal commented Nov 4, 2020 via email

@mrocklin
Copy link
Member Author

mrocklin commented Nov 4, 2020

Thanks for the comments @pitrou and @fijal (also, it's good to hear from both of you after so long)

I agree that vectorization would probably push us well into the millions of tasks per second mark. If you look at HPC schedulers in academia one sees this kind of throughput. At some point we'll have to think about this, and I look forward to that. I still think that we should be able to do better than the 5k tasks-per-second limit that we have today. Dicts are slow, yes, but not that slow.

@quasiben did some low-level profiling with NVIDIA profilers and found that the majority of our time in CPython wasn't in PyDict_GetItem, but had more to do with attribute access (I think). Previous Cythonization efforts left, I think, some performance on the table. I'd like to explore that to see if we can get up to 50k or so (which would be a huge win for us) before moving on to larger architectural changes.

@fijal
Copy link

fijal commented Nov 4, 2020 via email

@mrocklin
Copy link
Member Author

mrocklin commented Nov 4, 2020

Yeah, I'm hoping that we could get the same performance result with Cython. My hope (perhaps naive) is that there are large optimizations here that PyPy wasn't able to find, and that by writing this code manually and inspecting the generated C that we might be able to find something that PyPy missed. I'm making that bet mostly based on the belief that dict/attribute access isn't that slow, but I'll admit that I'm making it in igornance of an understanding of how awesome PyPy is.

Also, to be clear, I'm not saying "we're going with Cython" but rather "we should explore diving more deeply into Cython and see how it goes"

@fijal
Copy link

fijal commented Nov 4, 2020 via email

@mrocklin
Copy link
Member Author

mrocklin commented Nov 4, 2020

Ah! Got it.

@jakirkham
Copy link
Member

On the interpreter side, conda-forge and Anaconda both build CPython with profile-guided optimizations (PGO). So at least on that side, we are probably as well optimized as we can hope.

@jakirkham
Copy link
Member

Another option to consider if this were revisited would be mypyc, which mypy dogfoods (IOW mypy compiles itself with mypyc). This is likely a better fit to other approaches as it uses standard Python type hints, which have increasingly been added to the codebase, and it is targeting server applications, which Distributed basically is.

That said, mypyc itself is considered alpha (though it has been around for a little while) ( mypyc/mypyc#780 ). Also (and this may be inexperience on my part) it seems to look at all files instead of just one being requested and does not provide all errors at once (so one incrementally fixes issues, which can be a bit slow). However given the other points above, these may improve over time. So worth keeping an eye on.

@pitrou
Copy link
Member

pitrou commented Jun 11, 2022

Given that it's considered alpha, I'm not sure you want to go through debugging potential mypyc bugs on the Distributed source code. Also, I don't see any benchmark results on non-trivial use cases, though I may be missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests