-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol alignment #657
Protocol alignment #657
Conversation
Thx for opening this @davidbrochart Does this imply that the consumer of the kernel message over websocket (eg. jupyterlab) will have to be updated to take into account this change? |
This PR currently works with jupyterlab/jupyterlab#11841, by replacing the format of the kernel wire protocol over websocket with the proposal. However we should probably provide the new protocol on a new websocket endpoint, since we should support both protocols. The web client could try the new endpoint and fall back to the "legacy" endpoint. |
Yes, see jupyterlab/jupyterlab#11841. |
I am working on an server extension that relies on getting a json message from the kernel manager. I am prolly not the only one to rely on that. Let's discuss tomorrow the potential impact for those kind of scenarios. |
Maybe this new protocol should be opt-in based, rather than the default. I have not yet looked closely at the diff, but the opt-in option sounds to me safer. |
From what I can see in your server extension, you communicate with the kernel from the server:
So you are not concerned with this change. Or am I missing something? |
Let me look more at those changes by tomorrow. The linked extension expects to get from the kernel manager a complete json response in the server handler. |
We could use a query parameter to select the protocol instead of having a new websocket endpoint |
@vidartf mentioned the possibility today to use websocket protocols fields to select the protocol: https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/WebSocket https://html.spec.whatwg.org/multipage/web-sockets.html#the-websocket-interface, there was a question if this could be made backwards-compatible though. If that doesn't work for us, we could use a query parameter, and the absence of a query parameter implies the server default protocol. The protocol query parameter could be given as a string or a number. I think in order to be backwards-compatible, the server should default to the current behavior if no protocol is selected. We could change the "default" on a major version bump, similar to how the default pickle protocol is handled in Python. A client that uses these protocols must be used with a version of server that supports them (i.e. Another option brought up is to use a versioned endpoint, like |
If that option was chosen, would it make sense to generalize it to all endpoints? |
Potentially, yes |
I went ahead and tried @vidartf's idea of using websocket protocols, and it works pretty well. |
Very nice! |
I have played a few months ago with that optional parametr
(source https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/WebSocket) My goal at that time was also to pass additional information to the server, although that information was not stricly a If we expand the idea, the concept of In this case, the client could ask a protocol which is just The client could also just say More capabilities could be added. |
Thanks Eric, that's interesting. My choice of a version number is just one idea. |
I understand that supporting the new protocol on the frontend is easier as you remove code and make it less work to maintain. My conservative mind tells me that would be however safer to add a new protocol, rather than dropping the existing one and replacing with an other. This is what I have been used to see around. I guess it would be good if JupyterLab can continue to support any server, the new ones but also any other one in the field that would not support the new capabilities. Also I can imagine that deployments would pin jupyter-server to a version that would not support that new capability. This could be for extension compatibility reason, internal policy... So having jupyterlab supporting all protocols would be an added value. The negociation would be there to ensure the adminstrator of the server enforces what he wants (the list of supported capabilities could be a trait). |
I pushed jupyterlab/jupyterlab@96b58b5. |
Thx for working on this @davidbrochart
I was thinking that the |
I see, but it seems that this would require a "stackable" serialization/deserialization. For instance, if you wanted to add |
Yeah, the composable approach may be a bit too far fetch, but still worth to have a discussion about that. If at the end we stick to the versioning approach, this is also great. |
8780a22
to
cdcffaf
Compare
Codecov Report
@@ Coverage Diff @@
## main #657 +/- ##
==========================================
- Coverage 77.76% 77.51% -0.25%
==========================================
Files 110 110
Lines 10415 10541 +126
Branches 1403 1426 +23
==========================================
+ Hits 8099 8171 +72
- Misses 1926 1970 +44
- Partials 390 400 +10
Continue to review full report at Codecov.
|
cdcffaf
to
4b398ce
Compare
8b98255
to
ecda217
Compare
Just to document the details here, I think what you are saying is:
Yeah, I think probably your other option sounds good:
Nice! Perhaps that could also be automatically turned on with the existing |
FYI, list of websocket protocols Firefox understands and can show nicely: https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor/Inspecting_web_sockets#supported_ws_protocols |
There is no parsing but we need to build the JSON text like this (with the parts coming from ZMQ):
Yes, we would receive a (potentially big) JSON object that would need to be parsed to split the parts to send to ZMQ.
But |
One remaining question is about rate limitation. Unfortunately, turning it on makes the new protocol almost useless, since the messages from ZMQ need to be parsed before deciding to let them through or not. |
Good you brought this up. Two things
|
But how to enforce the widget data to be sent in binary messages? It seems to be left to the widget developer's good will. Even the documentation seems to encourage the data to be sent in the Jupyter protocol layer: "each comm pair defines their own message specification implemented inside the
I hadn't realized rate limitation could also drop widget messages. Maybe we could let them through by including |
I caution against turning rate limiting off by default without addressing the underlying issue. It was put in place to address a real pain point that was not uncommon (IIRC, overwhelming the communication channel). On the other hand, perhaps the efficiency improvements this protocol brings helps alleviate the problem? |
Actually, rate limitation would only impact jupyter_server, but not necessarily other Jupyter server implementations, like Jupyverse. It makes sense to have jupyter_server and Jupyverse follow different paths, so I'm fine having rate limitation turned on by default. |
Note: The current implementation does parse from zmq to websocket to compose the JSON in the form of dicts and lists. It could be smarter by manually creating the JSON string, but it does not do that. |
I think this is ready to go. I would like to address a better way to do rate limitation at a later point. The current rate limitation is a bit clumsy in that it has to parse the messages to rate limit because some messages (status idle messages) are treated in a special way to
|
I think all of my concerns about the protocol have been resolved. Thanks again! |
Where are we planning to document this wire format? |
Probably in the jupyter_server documentation, rather than in the jupyter_client one? |
Should we merge this PR and open a new one for documentation? |
That sounds great to me. Thanks again for all your work on this! (Since it's such a big change, and I haven't been to Jupyter Server meetings recently, I don't plan to merge it myself, but I do give a +1 from my end to the current state of the protocol) |
Towards a better format for the kernel wire protocol over websocket
Context
Jupyter web clients (e.g. Jupyter Lab) communicate with Jupyter kernels over websockets, but kernels natively use Zero-MQ sockets as their transport layer. The Jupyter server acts as a bridge between kernels and web clients, and is responsible for forwarding messages to/from kernels over Zero-MQ sockets from/to web clients over websocket:
Current situation
Kernel wire protocol over Zero-MQ sockets
The kernel wire protocol over Zero-MQ sockets is well specified. It takes advantage of multipart messages, allowing to decompose a message into parts and to send and receive them unmerged. The following table shows the message format (the beginning has been omitted for clarity):
Format of a kernel message over Zero-MQ sockets (indices refer to parts, not bytes).
The message can be deserialized simply by iterating over the parts:
Serializing is equally easy.
Kernel wire protocol over websocket
A kernel message is serialized over websocket as follows:
Format of a kernel message over websocket (indices refer to bytes).
offset_0
: position of the kernel message (msg
) from the beginning of this message, in bytes.offset_1
: position of the first binary buffer (buffer_0
) from the beginning of this message, in bytes (optional).offset_2
: position of the second binary buffer (buffer_1
) from the beginning of this message, in bytes (optional).msg
: the kernel message, excluding binary buffers, as a UTF8-encoded stringified JSON.buffer_0
: first binary buffer (optional).buffer_1
: second binary buffer (optional).The message can be deserialized by parsing
msg
as a JSON object (after decoding it to a string):Then retrieving the channel name, and updating with the buffers, if any:
Problem
The problem is that
msg
is serialized/deserialized using a JSON parser, which is very costly, considering that quite a lot of information could be included e.g. in thecontent
. For instance, widgets not using binary buffers might pass data through this field.Instead, the kernel wire protocol over websocket should remain as close as possible to the kernel wire protocol over Zero-MQ sockets, so that going from one to the other basically consists of moving blocks of raw memory, without any parsing involved whatsoever.
Proposal
Messages transiting over websockets cannot be decomposed into parts, as it is the case for Zero-MQ sockets. They are received as a whole, so the message must include its own layout description. Unlike the current situation, we propose to use a JSON-encoded layout format, but it is so small and simple that we don't expect any performance loss. The new format of the kernel wire protocol over websocket is the following:
layout_length
layout.offsets[0]
layout.offsets[1]
layout.offsets[2]
layout.offsets[3]
layout.offsets[4]
Format of a kernel message over websocket (indices refer to bytes).
Where:
channel
is a string indicating the name of the Zero-MQ socket ("shell", "control" or "iopub").offsets
consists of a list of at least 3 values, corresponding to the offsets ofparent_header
,metadata
andcontent
, respectively (header
implicitly has offset 0). If more values are provided, they correspond to the offsets of the optional binary buffers.