-
Notifications
You must be signed in to change notification settings - Fork 31
Attempt to re-establish websocket connection to KG #42
Conversation
This may be related to #39 Sorry for no idea about how to make the test for this.
Test Result |
d9dc86a
to
1639002
Compare
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good Evan - thank you!
@IMAM9AIS - are you able to take this for a spin in your env?
nb2kg/handlers.py
Outdated
try: | ||
message = yield self.ws.read_message() | ||
except Exception as e: | ||
self.log.error("Exception reading message from websocket: {}".format(e)) # , exc_info=True) | ||
if message is None: | ||
if not self.disconnected: | ||
self.log.warning("Lost connection to KG: {}".format(self.kernel_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change KG
to Gateway
to match others.
nb2kg/handlers.py
Outdated
break | ||
callback(message) # pass back to notebook client (see self.on_open and WebSocketChannelsHandler.open) | ||
else: # ws cancelled - stop reading | ||
break | ||
|
||
if not self.disconnected: # NOTE(esevan): if websocket is not disconnected by client, try to reconnect. | ||
self.log.info("Try to re-establish the connection to KG: {}".format(self.kernel_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try
-> Attempting
, KG
-> Gateway
.
nb2kg/handlers.py
Outdated
if self.ws is not None: | ||
# Close connection | ||
self.ws.close() | ||
elif not self.ws_future.done(): | ||
# Cancel pending connection. Since future.cancel() is a noop on tornado, we'll track cancellation locally | ||
self.ws_future.cancel() | ||
self.ws_future_cancelled = True | ||
self.log.debug("_disconnect: ws_future_cancelled: {}".format(self.ws_future_cancelled)) | ||
self.log.debug("_disconnect: ws_future_cancelled: {}".format(self.disconnected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update the content here to be more like:
_disconnect: future cancelled, disconnected: {}
@esevan @kevin-bates I am sorry i could not get back on this quickly.
This check still returns a false signal, due to which a write is finally triggered but since the stream is not available the final write operation does not work. This is during the first shell message that you try to execute after the connection has been lost.
Line 247 in ddf6b7c
But since the exception is gracefully handled nothing happens and the client is still unaware of the loss. |
@esevan I will try the changes you have done, but i am not sure that on_close() event here :- Line 249 in ddf6b7c
will be called at all. Also, from the changes you have done (from what i can understand), let's say even if the on_close() event is being generated, it still won't reconnect the sockets but just print the log to do it, which might not be user-friendly. |
Rather than doing this, I have another suggestion :- What if while creating the connection between NB2KG and JEG, we also start a ping mechanism in the IOLoop, that pings every 1 min say. Now in scenario where we somehow lose a connection, the first ping will obviously do nothing but will help websocket realise that the stream is closed. The second ping will raise an exception which will propagate back here :- Line 247 in ddf6b7c
But rather than handling this exception, if you raise it this actually triggers a websocket close event back to jupyter lab. If we look into jupyter lab's websocket close event here :- Rather the immediately closing the websocket, it tries to do reconnect attempts with increasing timeouts which will immediately fix the connection if it can be fixed with the network state. |
When nb2kg lost the connection to KG, nb2kg didn't connect to KG again although the websocket connection from the client was still alive. This change recovers the connection to KG to prevent above anomaly. Signed-off-by: Eunsoo Park <[email protected]>
@kevin-bates Thanks for your review, kevin! @IMAM9AIS Websocket connection closing event is also listened by
One more thing for what it's worth, |
@esevan Thanks a lot for clearing this. I will try out changes and will let you know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you!
When nb2kg lost the connection to KG, nb2kg didn't connect to KG again
although the websocket connection from the client was still alive.
This change recovers the connection to KG to prevent above anomaly.
Signed-off-by: Eunsoo Park [email protected]