Support one set of 0MQ channels per kernel #658

davidbrochart · 2022-01-13T09:43:43Z

Problem

jupyter_server creates a set of Zero-MQ channels for each client that connects to a kernel, but sockets can be a scarce resource on some platforms, and this can become an issue especially in the context of collaborative editing, when a lot of users work e.g. on the same notebook.

Proposed Solution

A solution to this problem was implemented in Jupyverse, where only one set of Zero-MQ channels are created for a given kernel. The session ID is used to demultiplex the messages from the kernel to the clients. This requires parsing the parent_header, which goes a little bit against the refactor of the websocket protocol, but the parent header is small so this shouldn't be much of an issue.
Maybe this could be supported in jupyter_server as well, as a configuration option?

The text was updated successfully, but these errors were encountered:

echarles · 2022-01-13T09:52:12Z

...which goes a little bit against the refactor of the websocket protocol...

In terms of header parsing, this should not be too costly, I agree.

In the protocol alignment world as proposed in #657, what such single set of ZMQ channel per kernel mean: Would all connected server clients receive exactly the same messages?

Asking myself the same question in the current world.

Maybe putting a few diagrams with the kernel session id in the story will help.

davidbrochart · 2022-01-13T10:01:56Z

With one set of ZMQ channels per kernel, that would look like this:

                                      web client 0
kernel <-----> Jupyter server <-----> web client 1
          ^                      ^    web client 2
          |                      |
          |                      |
   Zero-MQ sockets           websocket

A web client only receives the messages that are addressed to it, except for the IOPub messages that are sent to all web clients.
It's actually independent of #657.

davidbrochart · 2022-01-13T10:11:18Z

The current situation looks like:

       <----- - - - - - - - - ----> web client 0
kernel <----- Jupyter server -----> web client 1
       <----- - - - - - - - - ----> web client 2
          ^                     ^
          |                     |
   Zero-MQ sockets          websocket

There is a mapping of a set of ZMQ sockets to a websocket.

echarles · 2022-01-13T10:20:27Z

What about the session? Do they have separated sessions?

                                      web client 0 (session-0)
kernel <-----> Jupyter server <-----> web client 1 (session-1)
          ^                      ^    web client 2 (session-2)
          |                      |
          |                      |
   Zero-MQ sockets           websocket

If such I don't see how the same message header can be created as all clients would receive the same message, but the session id would have to be different.

{
    "msg_id" : "str",
    "session" : "str"
    "username" : "str",
    "date": "str",
    "msg_type" : "str",
    "version" : "5.0"
}

If the session id would be the same, this would ask all clients to share the same session id, which is I am not sure is what we want. We want to be able to segregate who is doing what.

davidbrochart · 2022-01-13T10:27:35Z

What about the session? Do they have separated sessions?

Yes, they have separate sessions, since this is what allows us to send messages from the kernel to the right web client.

If such I don't see how the same message header can be created as all clients would receive the same message, but the session id would have to be different.

I don't follow, the message header is not the same.

echarles · 2022-01-13T10:30:01Z

I don't follow, the message header is not the same.

Do I understand correctly that the clients will all receive the same body, but with different headers?

davidbrochart · 2022-01-13T10:35:36Z

For IOPub messages yes, since its purpose is to be broadcast.
But not for peer-to-peer connections of course (shell, control, stdin), this feature wouldn't change the kernel protocol.

echarles · 2022-01-13T10:48:42Z

Thx for the explanations.

I can imagine cases where the server would enforce some security measures based on the kernel header (e.g. authorization based on the user field...).

I guess this would still be possible with the changes we are discussing here, thx for confirming.

With the other discussion around protocol alignement #657, I have the same question, but have more doubt on the feasibility to support such feature (rules enforcing on the sever side).

davidbrochart · 2022-01-13T10:56:51Z

I can imagine cases where the server would enforce some security measures based on the kernel header (e.g. authorization based on the user field...).

I guess this would still be possible with the changes we are discussing here, thx for confirming.

Yes, it would still be possible.

With the other discussion around protocol alignement #657, I have the same question, but have more doubt on the feasibility to support such feature (rules enforcing on the sever side).

It's also possible, nothing prevents the server to parse messages, it would just not be done by default for performance reasons.
Actually, if you look at the PR, messages are parsed when needed, e.g. if checking message rate limits is enabled.

echarles · 2022-01-13T11:05:09Z

Thx for the above confirmations.

It's also possible, nothing prevents the server to parse messages, it would just not be done by default for performance reasons.

My bet is that in any performance demanding environment, there will mostly always be a need for managed and controlled server side processing (like rate limits, authorization, auditing, logging...), so the potential performance benefit looked at with #657 will vanish as there would be always a need to parse some parts of the message.

davidbrochart · 2022-01-13T13:16:09Z

Let's not mix everything, this issue is independent of #657.

My bet is that in any performance demanding environment, there will mostly always be a need for managed and controlled server side processing (like rate limits, authorization, auditing, logging...)

Server side processing doesn't necessarily mean parsing messages:

rate limiting is about counting messages per second, or bytes per second, which can be done on raw data.
the direction taken in Jupyter for authorization doesn't involve parsing kernel messages.

so the potential performance benefit looked at with #657 will vanish as there would be always a need to parse some parts of the message.

I disagree. The most costly parsing is for the content which can be big. The proposal in #657 separates header, parent_header, metadata, content and buffers into parts, so parsing e.g. parent_header can be done on its own without a big performance loss.

But again, comments about #657 should go there.

minrk · 2022-01-14T14:00:50Z

the parent header is small so this shouldn't be much of an issue.

Anticipation of cases like this are exactly why we send the headers as separate frames, and we have Session.deserialize(content=False), so you can deserialize only the generally-small headers to make routing decisions without having to deserialize the potentially large content. IPython Parallel's schedulers rely on this, for instance.

This certainly makes the server more complicated, since it has to reimplement broadcast semantics, but seems sensible if folks have actually seen the number of connections to a single kernel already be a limiting factor.

davidbrochart · 2022-01-15T10:49:08Z

Let's keep this feature only in Jupyverse for now, since it's already implemented there.
If we see a need for it, then we could support it in jupyter_server.

davidbrochart added the enhancement label Jan 13, 2022

This was referenced Jan 13, 2022

Name initial core team and add documentation about roles jupyter-server/team-compass#14

Merged

Meeting Notes 2022 jupyter-server/team-compass#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support one set of 0MQ channels per kernel #658

Support one set of 0MQ channels per kernel #658

davidbrochart commented Jan 13, 2022

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

davidbrochart commented Jan 13, 2022 •

edited

Loading

echarles commented Jan 13, 2022 •

edited

Loading

davidbrochart commented Jan 13, 2022

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022 •

edited

Loading

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

minrk commented Jan 14, 2022

davidbrochart commented Jan 15, 2022

Support one set of 0MQ channels per kernel #658

Support one set of 0MQ channels per kernel #658

Comments

davidbrochart commented Jan 13, 2022

Problem

Proposed Solution

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

davidbrochart commented Jan 13, 2022 • edited Loading

echarles commented Jan 13, 2022 • edited Loading

davidbrochart commented Jan 13, 2022

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022 • edited Loading

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

echarles commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

minrk commented Jan 14, 2022

davidbrochart commented Jan 15, 2022

davidbrochart commented Jan 13, 2022 •

edited

Loading

echarles commented Jan 13, 2022 •

edited

Loading

davidbrochart commented Jan 13, 2022 •

edited

Loading