-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need ability to identify dormant kernels to quiesce them #96
Comments
Thanks for opening this @rwhorman. The request is reasonable from an admin/devops perspective. The challenge will be getting the hooks into the right spots to watch for traffic to/from kernels via the websocket connections, including idle/busy status indications. This has partially been done with jupyter/http-configurable-proxy in the past because adding the capability to the notebook server itself is a bit of place. Here in kernel gateway, which is meant to be a programmatic API for kernels, though, it makes more sense IMHO. We'll give it a shot. Strawman API in
where:
Alternative: Add the metadata right into the |
Probably the place to try would be to write a new WebsocketHandler override in the https://github.com/jupyter-incubator/kernel_gateway/blob/master/kernel_gateway/services/kernels/handlers.py module. In the new class, override on_message to watch traffic from a client to a kernel and _on_zmq_reply to watch traffic from a kernel to a client.
Define a new ActivityManager class, instantiate it in the gateway app, and pass it as settings available to both the new activty handler for |
@rwhorman Any concerns that the KG will hold this activity information in memory? Are you shutting down KGs independent of kernels? Some interesting scenarios:
How about this structure below:
where:
In the above scenarios, when I refer to |
Something that may help users decide on their specific use case is to provide more information:
|
#97 is a WIP, but a base implementation is there for |
Nice. Glad it wasn't overly complicated. Some tests will help too once the values are defined. |
@parente, yes on the tests just waiting for us to finalize on what we want. |
@Lull3rSkat3r I like what you did there w/ additional info. I can image ways to utilize the info to smart-track, and potentially optimize the deactivation logic. ... and I can imagine how we may consider feeding some of this back to the user to inform why deactivation happened; you know, those developers as end users ;-) |
@lbustelo In-memory is cool. We only care while KG is up, and it comes down when we say so .. i.e. when kernels are all gone or we deem them to be sufficiently inactive. scenario 1: No, we wouldn't terminate a kernel if they are pushing data thru. A ticking meter is a happy meter in the cloud services world ;-) scenario 2: I view this as falling under best practices, in that long-running jobs need to write results to a storage service that they subsequently reconnect to and query. Another relevant aspect here is that the deactivation timeout can be tuned so as to catch the 80-20; for overnight use cases, we can set to 12 hrs or so ... heck, even 24hrs is not a bad TTL. What we wouldn't want is somebody sitting on an idle KG-kernels for multiple days |
@rwhorman sounds good to me. |
looks good to me @Lull3rSkat3r ! much appreciate the very quick turnaround! |
In the context of a service that needs to efficiently manage jupyter-kernel-gateways for users, it is necessary to be able to monitor kernels to ascertain a sufficient period of inactivity, thereby triggering kernel shutdown.
In the bigger picture, where a kernel-gateway is provisioned for each user, it is necessary to quiesce that user's kernel-gateway when they've "stepped away" from the API and left their kernels inactive for an extended period of time; gateway can be brought back up when they return.
Since kernels can drive work outside of jupyter/gateway, there's more involved than just monitoring the kernel-gateway. But for the kernel-gateway, must be able to monitor for last real activity in a kernel, say timestamp. So, it would help to have an API that would return, for each kernel provisioned, the last time there was "activity" through the kernel. Not sure about the best way to handle no kernels; if nothing returned, no way to determine how long since kernels were running unless state handled outside of the gateway.
The text was updated successfully, but these errors were encountered: