-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many widgets fail to render #534
Comments
I think this is a good opportunity to refactor this voila manager, to allow progressive rendering of widget (create widgets as they come in, and even try to render them as they come in). Also, we can think of connecting the websocket/kernel as early as possible, not wait till the execution is done. |
@maartenbreddels - do you want to do a video call to discuss this? It sounds like a tricky issue. |
Notebook 6.0.2 seems to have the same issue, it worked once with n=9, but usually does not, Jupyter Lab 1.2.4 also does not render with n=8. So I don't think this is a voila issue. Voila sees the 2558 comms/widgets, requests all their info, but gets ~2300 replies.
I'm on whereby! |
I wonder if the problem is overwhelming the websocket buffer. All of these requests are made simultaneously. I don't know what the websocket buffer size is, but perhaps it is overflowing and From https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/send
Can you check what the websocket bufferedAmount is? See also https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/bufferedAmount |
@maartenbreddels and I talked about this more, and it seems that problem is here: Lines 117 to 140 in 8557d9d
In particular, for his reproduction example, this gets back 2558 comm ids: Line 118 in 8557d9d
But this Lines 120 to 123 in 8557d9d
He can reproduce this sort of issue in classic notebook and jlab by creating all these widgets, then refreshing the page (which triggers the same sort of logic to request all widget state from the kernel to populate the client widget manager) |
I think the key is to first narrow down if the problem is in getting the update requests to the kernel, in the kernel, or getting the updates from the kernel. @maartenbreddels decided to instrument the kernel to track these state requests, to see if the kernel is getting all the requests, and if it thinks it sent all the updates. That should narrow down the problem. One other quick fix might be to just break up Line 120 in 8557d9d
|
Since @maartenbreddels asked - here is where 2558 comes from...
|
The comms are limited to 1000 msgs/sec by default in the Notebook. This might be the reason for this issue.
|
Thanks, that helped sleeping :) Ok, I've narrowed down the issue a bit, by patching jupyter_server like this:
I can conclude the following
I've analyzed the request_order dict, and in 1 run, the first 1165 request_state calls got answered, the next 130 got dropped, the next 500 (or 501) got answered, then 459 dropped, and the last 305 dropped. from the kernelI monkey patch ipykernel and the Comm class to see what we do at the kernel side:
So we get all requests , and we send all the updates, but they don't arrive at the server. |
Good idea, but I don't see any warning (could always end up in /dev/null thought). But the tracking of the reply happens before the rate limiting (i.e. the limiting is between the frontend and server, and I already lost the message between the kernel and the server) |
The iopub rate limiting is at the server on messages going from the kernel to the browser. That seems to be where you are dropping messages? Can you up the rate limits to see if it solves the issue, just in case? |
To up the limit to 10x its default: |
I see
I intercept the messages at the server at: So I think this is before any rate-limiting is applied. So I think either the zmq layer does not send the packets, or the server does not receive them. |
Gives us 1000, meaning it will buffer 1000 messages, before dropping messages, as explained at: http://api.zeromq.org/2-1:zmq-setsockopt (ZMQ_HWM: Set high water mark). Patching ipykernel, and removing that limit |
ah, nice! By the way, here is a nice small program for spying on a specific kernel directly, in case you want to do something like count the iopub messages coming directly from the kernel zmq socket. Use it like from collections import Counter
from jupyter_client import BlockingKernelClient
from jupyter_core import paths
import os.path as path
import sys
connection_file = path.join(paths.jupyter_runtime_dir(), 'kernel-{}.json'.format(sys.argv[1]))
print(connection_file)
client = BlockingKernelClient()
client.load_connection_file(connection_file)
client.start_channels()
counter = Counter()
while True:
msg = client.get_iopub_msg(timeout=100)
counter[msg['msg_type']] += 1
print(counter) |
Where is the high water mark set? ZMQ apparently defaults to no limit, according the docs you linked to. |
Yeah, puzzled by that as well. ipython/ipython#3304 says v3 puts it by default to 1000, I am not sure what v4 does, digging into this now. |
Do we really want to patch this? What was the original reason for limiting to 1000msgs/sec? If we change this here, I guess we also need to change the |
I think our first fix is just batching our update requests to try to be under this limit, which is a change to the widget manager. |
I'm not saying we want to do that, but we want to understand the issue, and maybe have workarounds/configurations for people that need it. |
I think this is an interesting idea, so you say we should change the widget protocol? |
No, just that we should do what I mentioned above in #534 (comment)
|
ZMQ_SNDHWM at http://api.zeromq.org/4-3:zmq-setsockopt says a 1000, so indeed. Should we discuss this at ipykernel, open an issue there? |
Ah, now I understand, that's a great idea! Or, have at most 500 unanswered requests, using a Semaphore-like mechanism. |
According to the zmq page:
So actually the message queue may be far smaller than 1000. Wow, sounds like we either need to be quite conservative, or raise the limit. Yes, let's start an issue on ipykernel and CC @minrk. |
I was thinking just putting it in a loop and awaiting the resolutions in batches, perhaps batches of 100, given the hwm limit may actually translate to something like 100 messages in certain cases? |
…age limit in the kernel. Current ZMQ by default limits the kernel’s iopub message send queue to at most 1000 messages (and the real limit can be much lower, see ZMQ_SNDHWM at http://api.zeromq.org/4-3:zmq-setsockopt). We now request comm state in batches to avoid this limit. See voila-dashboards/voila#534 for more details, including where this was causing real problems.
I posted a PR to jupyter-widgets/ipywidgets#2765 |
…age limit in the kernel. Current ZMQ by default limits the kernel’s iopub message send queue to at most 1000 messages (and the real limit can be much lower, see ZMQ_SNDHWM at http://api.zeromq.org/4-3:zmq-setsockopt). We now request comm state in batches to avoid this limit. See voila-dashboards/voila#534 for more details, including where this was causing real problems.
Was this issue fixed? I am currently having an issue where a notebook that has multiple tabs each with many widgets and some plotly plots will only sometimes render when served with voila. ''' |
We're currently fixing this in #546 It was fixed only in the 0.1.18.x branch |
This is now fixed in voila 0.1.21, let us know if you still have issues! |
This seams to have made a difference. So far, it has not failed to render. |
…age limit in the kernel. Current ZMQ by default limits the kernel’s iopub message send queue to at most 1000 messages (and the real limit can be much lower, see ZMQ_SNDHWM at http://api.zeromq.org/4-3:zmq-setsockopt). We now request comm state in batches to avoid this limit. See voila-dashboards/voila#534 for more details, including where this was causing real problems.
Hi. I am posting here as I am asking advice. I seem to be having widget refreshing issues similar to the descriptions in this ticket, however the problems are strongly correlated with having started voila with preheated kernels. How would you diagnose it? Using voila 0.3.1. |
Hello @tilusnet, can you open another ticket and provide us with a minimal example to reproduce the issue? |
Hi @trungleduc Sadly it's not something consistently reproducible. What I can tell you about my setup:
|
FYI @trungleduc - Someone could test as per the documentation example here: +1 matter that contributes: activation of preheated kernels. |
#1101 could be it, it certainly is a bug that can cause they page to fail to render |
cc @aschlaep
Eg.
Will often fail to render in voila. I think it's likely to be a timing/order issue in the manager (message received before an event handler is attached), or packets being dropped (iorate limit?)
The text was updated successfully, but these errors were encountered: