-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document communication protocol #3357
Comments
cc @mrocklin if you have time to comment. My understanding is that the protocol developed organically as we needed things.
Is deserialization necessary? I (possibly incorrectly) thought that the scheduler didn't deserialize things, but I may be thinking of something else. The documents in I suspect we'd be happy to take simplifications to the protocol as long as they don't harm the current implementation. |
Right, (de)serialization is an overloaded term in this scenario. I'll describe a specific use case. The However, with more complex pipelines (distributed pandas), we have observed that the definition of the task may also be a msgpack object, which is recursively serialized, with subheaders, possibly compression and other stuff. We thought that since the scheduler should be language agnostic, we could just take this msgpack blob and pass it on to the worker, but the I suspect that the property that the Dask scheduler is language agnostic might not be enforced strictly. The low level So to go forward, we would need to either document or simplify the protocol. If there is no documentation because the protocol grew organically, we will probably have to find some simple subset of the protocol and change the complex client code (for example the Dask messages generated from pandas tables) to use this simpler version of the protocol. Because without a documentation I'm not sure that we could reimplement the existing protocol in Rust, since there will probably be a ton of edge cases, intended or not (Python is super dynamic, so what is an edge case in Rust might not be a problem in Python). |
Thanks. I suspect that improvements to the docs would be welcome wholeheartedly. And changes to the protocol / the implementation that make things easier in other languages would be welcome as long they don't overly harm the current implementation's performance and readability. |
+1. Sorry for my lack of engagement here. I've been swamped
…On Wed, Jan 15, 2020 at 4:30 AM Tom Augspurger ***@***.***> wrote:
Thanks. I suspect that improvements to the docs would be welcome
wholeheartedly. And changes to the protocol / the implementation that make
things easier in other languages would be welcome as long they don't overly
harm the current implementation's performance and readability.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3357?email_source=notifications&email_token=AACKZTA3QP5UB4DVYXWFDCDQ536VXA5CNFSM4KELBFS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJAEXYQ#issuecomment-574639074>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGVEW7DB5T2G6OCYY3Q536VXANCNFSM4KELBFSQ>
.
|
FYI I created a work-in-progress documentation of Dask messages. I couldn't find a better (terser) schema format, so I just used TypeScript. The documentation contains a subset of Dask messages (more or less exactly the subset that our Rust scheduler currently attempts to support). Apart from some inconsistent naming conventions and the Task definition structure (which is pretty... polymorphic :-) ), there are some small quirks that we had to replicate. For example the |
Hi! Together with @spirali we are attempting to rewrite the Dask scheduler in Rust (#3139). We had some initial success, but when we moved to more advanced Dask programs (for example distributed pandas), the communication protocol between the clients/workers and the scheduler became very difficult to handle.
Right now we are facing two issues:
For example, we thought that the
compute-task
message which is sent from the scheduler to workers needs to havefunction
andargs
parameters which contain the necessary code and data to run on the worker. However, when running the following Python script:the scheduler sends some
compute-task
messages to workers which do not contain thefunction
andargs
keys:Another problem is that the definition of the task itself (inside
tasks
dictionary inupdate-graph
messages) may be serialized. Even if it only uses msgpack, the serialized format can be quite complex and we are not sure how to reimplement the (de)serialization in Rust. Without a proper documentation of the format we can only guess how to implement the scheduler properly.I suspect that there may be some legacy cruft hidden inside the Dask communication protocol (both the serialization format and the message API itself) and it may be worth it to document it properly and possibly make some simplifications. If this documentation exists somewhere already please let us know. What do you think?
The text was updated successfully, but these errors were encountered: