Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel ram metric #4075

Closed
wants to merge 3 commits into from
Closed

Kernel ram metric #4075

wants to merge 3 commits into from

Conversation

Hyaxia
Copy link
Contributor

@Hyaxia Hyaxia commented Oct 7, 2018

I have added a metric for the RAM per kernel.
Each kernel appears alone identified by his id and type.
I have chose to implement the monitoring by a background task with the tornado framework.
As soon as you start the notebook it will all the running kernels.
Every 0.2 seconds it will set the RAM usage of the specific kernel to what is uses at that moment.
Additionally, the way i check the ram of each process is with the psutil package.
I have found that this was the easiest way, if you guys have something else in mind please

relevant issue #3682

Hyaxia and others added 3 commits October 3, 2018 00:10
    Introducing the 'psutil' package as a new dependency.
    Added another 'Gauge' to the 'metrics.py' module.
    Added to the __init__ 'MappingKernelManager' a call to the function
    that will run the task of updating the metric as a background task.
    (NOT SURE IF THE WAY THAT THE BACKGROUND TASK IN IMPLEMENTED IS
     THE RIGHT ONE, BUT THAT IS WHAT I HAVE FOUND)
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea behind this feature - thank you for working on it! As I mention, this will break remote kernel applications like Jupyter Enterprise Gateway, but we could get to a good spot if there's a way to override capture.

Since this is potentially destabilizing, we should probably have it disabled by default.

Thanks.

@gen.coroutine
def kernel_ram_monitoring(self):
# Arbitrary time for waiting till next metric collection
yield gen.sleep(0.2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five passes per second seems a little excessive to me. I suspect that on highly loaded servers (with lots of kernels running), the set of kernels may not get completed before the next iteration will occur. If this interval could be configured, that would be nice. Seems like a default of 1 second would be frequent enough for the normal case (of a couple kernels).

# Update the relevant kernel's ram usage metric
def update_ram_metric():
pid = kernel.kernel.pid
kernel_process = psutil.Process(pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming that all kernels are local to the server - which is not the case for Enterprise Gateway - which derives its kernel management from these classes. In this case, EG will break since the launching process is no longer running.

I would recommend moving the code that gets the memory usage to a method on the individual kernel manager class - this way child classes can do their own means of capturing memory usage. Unfortunately, Notebook doesn't define a kernel manager class. If one was added, it would derive from jupyer_client's IOLoopKernelManager - then we'd need to fix up the hierarchy usage in Jupyter Kernel and Enterprise Gateway. I'd be happy to help with this if others agree. The other option is to abstract the kernel class that represents the process. We do this in Enterprise Gateway with our process proxy classes, but that's a bit more work. We'd then have a 'hook' to place capture code for our remote kernels.

Any way we can make the capture of per-kernel info more abstract would be great.

In the meantime, we should probably make this feature optional and, preferably, off by default so as to not break applications.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea behind this feature - thank you for working on it! As I mention, this will break remote kernel applications like Jupyter Enterprise Gateway, but we could get to a good spot if there's a way to override capture.

Since this is potentially destabilizing, we should probably have it disabled by default.

Thanks.

@minrk
Copy link
Member

minrk commented Oct 10, 2018

This should probably be implemented at the lower, KernelManager level in jupyter_client, so that when an implementation is swapped out, it can override the appropriate method.

@Hyaxia
Copy link
Contributor Author

Hyaxia commented Oct 15, 2018

Thabk you both, @minrk & @kevin-bates.
I will check the jupyter client.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants