Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Add method for kernel manager to retrieve statistics #407

Closed
wants to merge 4 commits into from
Closed

[WIP]Add method for kernel manager to retrieve statistics #407

wants to merge 4 commits into from

Conversation

Hyaxia
Copy link

@Hyaxia Hyaxia commented Nov 27, 2018

A new method called 'memory_usage' that checks the kernel ram usage
locally.

Added a test for the method and inserted the psutil package (definitely not sure if its a good test, there might a better check to perform)
as a new dependency for the project.

Background:
The reason i want to add this method to the kernel manager is that in the notebook project we
wanted to create a functionality that lets us query the ram usage of the usage of the kernels
that are active.
In the PR jupyter/notebook#4075 it has been brought to my attention that implementing it in notebook will break
applications that use the kernels remotely.

For that reason I have implemented the method for checking the ram the way I did,
importing the package 'psutil' inside the method so if the implementation is swapped out
and wont be using the package anymore, it wont bother anyone.

@Hyaxia Hyaxia changed the title New method for kernel manager for checking ram. Add method for kernel manager for checking ram. Nov 30, 2018
jupyter_client/manager.py Show resolved Hide resolved
@Hyaxia Hyaxia changed the title Add method for kernel manager for checking ram. [Work-In-Progress]Add method for kernel manager for checking ram. Dec 16, 2018
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I think the result of kernel_stats() should return actual result values, not functions that produce the results later. That approach could be problematic depending on where the consumer of kernel_stats() resides (which, I'm assuming, may not be in the notebook process at all if this where to be plumbed to a handler).

What is the reason for returning functions instead of the snapshotted results? Is the idea that the callers would then have the particular functions in their hands and they can then just call those methods are various intervals? If so, I think it's better to have the callers call the public kernel_stats() method periodically and leave the various functions that produce the results hidden from the caller.

different stats about the kernel.
"""
return {
"memory_usage": self._memory_usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return the results of _memory_usage(), otherwise we'll get into issues for when the consumer tries to get at the memory usage.

kernel_stats() should be the method that can be overridden, not _memory_usage(). This way subclasses could even extend the json results with their provider-specific details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk thanks for the review!

I get what you are saying, BUT, few things:

you have said That approach could be problematic depending on where the consumer of kernel_stats() resides, could you give me an example for such scenario?

Also, you said, callers would then have the particular functions in their hands and they can then just call those methods are various intervals? If so, I think it's better to have the callers call the public kernel_stats() method periodically and leave the various functions that produce the results hidden from the caller.

Yes that was the intent of returning only the function object.
If the json that the function kernel_stats returns will grow,
then it means that the time it takes to collect all of the data will increase, meaning that letting it run all of the data collection as soon as it is executed will take more time.

Also, consider that different places that use the function will need different types of data.
Not all of the uses for the kernel_stats will use all of the json values, meaning that they are generated for nothing.
So, if for example, lets say we have the need to check every few seconds the ram of the kernel, but we have also added other types of checks into the json, and they all run as soon as you call the function kernel_stats, we will spend getting unneeded data just to get that one value from the json, which will result in a performance decrease.

I think that letting the user choose the kind of data that he will get from the function is much preferred in terms of performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hyaxia - I'm not sure if you're confusing me with Min, but, if so, I'm flattered! 😃

I apologize if this is getting convoluted, which it seems to me that it is, but part of my issue is understanding how the results of this function are actually consumed. I figured, given your original PR to notebook, that notebook would be calling this method, BUT that the call in Notebook would be performed on behalf of a web handler so that the consumer of the notebook would actually trigger the stats gathering. So, this would eventually be plumbed as a REST call that web clients invoke to get the kernel usage statistics - one of which is memory consumption.

So, if kernel_stats returns a json of functions (and not the results of those functions), those "handles" only pertain to the notebook process itself. As a result, the web handler would need to invoke those functions since that's the last chance that the json will provide meaningful context because the function handles context will be lost otherwise. That's okay, but it strikes me as odd since those functions can be called within the kernel_stats() method itself.

I view this as a method that is called on the order of 5-10 second intervals (or even triggered by a manual "button" to get current stats), so the frequency is not very often in the grand scheme of things. However, to your point, no "category" of stats should take more than a few dozen milliseconds to capture. Even 100 ms is a long time, so I would hope that any other interesting categories are also inexpensive - and that should be a criteria for that categories' inclusion in the json.

If we preserve that goal that none of the stats are expensive, then it should be fine to be able to call this method on 5-10 second intervals w/o detriment. In addition, other implementations that MUST override kernel_stats() because the kernel is remote, etc. absolutely cannot return functions since those functions do not apply to the caller's context. As a result, the boundary of work must be the public function - kernel_stats().

Btw, when remote access is required to get these stats, the latency of having to go remote in the first place argues that as much information should be gathered at that time, since the cost of the stats call is far greater than the cost to gather the different categories in the first place.

I hope that clarifies things a bit.

Copy link
Author

@Hyaxia Hyaxia Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-bates haha sorry for the confusion 😆

That's okay, but it strikes me as odd since those functions can be called within the kernel_stats() method itself.

I agree, it might also make things complicated 😐

so the frequency is not very often in the grand scheme of things

This is really depending on the use case.

If we preserve that goal that none of the stats are expensive, then it should be fine to be able to call this method on 5-10 second intervals w/o detriment

In my opinion it is pretty restricting.
You might have a lot of different types of data gathering added in the future, which will mean that there might come a time where you would need to decide what type of data gathering to leave in and what to take out because all of the functions inside kernel_stats will execute and take too much time.

In addition, other implementations that MUST override kernel_stats() because the kernel is remote, etc. absolutely cannot return functions since those functions do not apply to the caller's context. As a result, the boundary of work must be the public function - kernel_stats().

Yes I agree, let me make myself clear a bit more, I ment that the user should override also the _memory_usage function. You wouldn't want to leave the implementation of the function as is and create a new one leaving the old one standing there.

Btw, when remote access is required to get these stats, the latency of having to go remote in the first place argues that as much information should be gathered at that time, since the cost of the stats call is far greater than the cost to gather the different categories in the first

I agree.

I thought about it some more, let me propose a new implementation.
It is sort of a combination of the two approaches.
The function kernel_stats will receive a list of strings.
Each string representing the type of data that the caller wants to get.
Iterating Over the different types, we will execute the appropriate functions.
This way:

  1. The execution and gathering of the data is happening inside the kernel_stats function.
  2. We execute only the functions that are necessary, this leaves the user much more flexibility because he can specify exactly the data. Because the user specifies the data that is collected, we don't have the restriction of the time for the data gathering functions the kernel_stats exposes.
    Because of no time limit, we can implement gathering function that take much more time, leaving the user of the function to try and see what time frames are good for him for his use case.
  3. Also there will never be a problem to add another kind of data gathering, since the user is the one responsible for specifying his needs, leaving him responsible for the time it takes to run the data gathering.
  4. The user of the function must know what kind of data he is looking for.

I will make a new commit soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hyaxia - this seems like a good approach - thank you.

I'm still unclear who/what the actual client/caller of this method is and how they ultimately get the data. I'm assuming this will become exposed in the api/kernel/{kid} endpoint somehow or a api/kernel/{kid}/stats endpoint is added. As of now, this method is usable as a form of diagnostics that is added ad-hoc within the NB server code.

I think the list of values to gather will want to be exposed via a config parameter (e.g., --KernelManager.kernel_stats='memory_stats, latency'), but that can be added when a formal call to this method is implemented. In addition, there should be a way to skip this call - perhaps if the config parameter is not defined?

Since the categories are now specified, their names become more important. Looking at the command line example, it looks like memory_stats should become memory since stats is redundant. Or the suffix could be finer grained, memory_vm, memory_rss, etc. relative to which statistic is being returned. Just a thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others can correct me if I'm wrong, but my understanding is that every sub-class of Configurable supports the ability to specify runtime options via the command line and/or configuration file. The command line syntax is --<className>.<option>=<value> and config file is c.<className>.<option>=<value>. So for this, the option could be defined on KernelManager or MultiKernelManager (probably depending on where the kernel_stats() method will be called from) and look something akin to this code in Notebook.

kernel_statistics = List(Unicode(), config=True,
        help=_("""Comma-separated list of support categories to gather relative to the active kernel.   No statistics are gathered if option is not defined.  Categories are: memory""")
    )

You can then specify validation methods to ensure the values are valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I got it right,
So by adding the lines you wrote above,
kernel_statistics = List(Unicode(), config=True, help=_("""Comma-separated list of support categories to gather relative to the active kernel. No statistics are gathered if option is not defined. Categories are: memory""") )

We will then check if there is a configuration and if there is then use it like that code below?

if kernel_statistics:
    stats = kernel_statistics

If so, how do these configuration tools work together with the *stats parameter that the kernel_stats receives? wont there be a conflict between those two? I'm not so familiar with the concept 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, although kernel_statistics (or whatever you call it) would be referenced in the code via self and essentially becomes the stats parameter indicating what to capture.

if self.kernel_statistics and len(self.kernel_statistics) > 0:
    return km.kernel_stats(self.kernel_statistics)

You could also define a @validate method that is called to validate the option's value when encountered during application launch.

Copy link
Author

@Hyaxia Hyaxia Dec 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the traitlet.
A question,
how would you handle the exceptions that accrue inside the kernel_stats function?
ATM I'm simply logging it and stopping the execution, while returning the data that was collect so far.

btw, I dont think there is a need for a validation here because we just skip any keys that are
not found, and the validation of the traitlet being a list of strings is a default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Logging and continuing to gather the rest of the keys on exceptions is probably the right thing for this since its a background "thread" anyway. The important thing is to document the expected behavior for errors and unrecognized keys. You could return a <key>_errmsg field containing the applicable exception message, but we should see how this pans out first before going down that road.

jupyter_client/manager.py Outdated Show resolved Hide resolved
jupyter_client/tests/test_kernelmanager.py Outdated Show resolved Hide resolved
jupyter_client/tests/test_kernelmanager.py Outdated Show resolved Hide resolved
@Hyaxia Hyaxia changed the title [Work-In-Progress]Add method for kernel manager for checking ram. [Work-In-Progress]Add method for kernel manager to retrieve statistics Dec 21, 2018
if stats and len(stats) > 0:
return get_stats(stats)
# If no input was passed but there is a default configuration
elif self.kernel_statistics and len(self.kernel_statistics) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this elif is necessary. The caller should be using self.kernel_statistics as the stats parameter to this method call, so if stats is None or empty, then it seems like this method should return the empty set.

I suppose the other approach, now that kernel_statistics is defined, is to not have a stats parameter at all and use kernel_statistics entirely in this method. But having the stats parameter would allow callers to capture a subset of the supported categories that are specified in the configuration option.

Copy link
Author

@Hyaxia Hyaxia Dec 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it some more, why would it be better to have this trait?
Wont it be better to just let the consumer of the function to decide what he wants to get,
without starting to handle some sort of configuration?
I see the reason for using a trait in a case of something that stays kind of static, for example the notebook_dir that you have linked the validation method to.
But this method is supposed to be flexible, we should not restrict the user in the choosing of the data he wishes to collect.
But by initializing the kernel_statistics we make it harder for the consumer of the function to quickly and easily change the data he gets.

I don't think this elif is necessary. The caller should be using self.kernel_statistics as the stats parameter to this method call, so if stats is None or empty, then it seems like this method should return the empty set.

If we tell the user to call the function with the kernel_statistics, then it becomes much less flexible, because every little change we will have to change the parameter, instead of just changing the input to the function (also applies to new uses of the function).

What are the pros of having such trait? seems to me like its restricting in this case, whereas *stats is not.

The only use case that i found it useful, is as I wrote, # If no input was passed but there is a default configuration , (as a default config)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delayed response. I guess I'm having difficulting understanding how the consumer of this method is implemented. I was under the impression these stats would be called, probably from the notebook server code periodically. Since that code is baked into the server, how does the end user control or influence what stats are gathered? Nothing has been defined thus far that makes what is gathered quick and easy to change.

I figured by exposing the "stat categories" via a configuration option, then that would be how a given installation determines what stats are important and should be gathered. In addition, this trait also serves as an enablement switch so that this functionality can be disabled (by default). I don't follow why an installation would want to gather different stats dynamically. What is gathered seems like a static thing to me.

At this point, I feel like things have gone sideways a bit and I apologize if you're feeling frustrated. I believe the important things are that this functionality is disabled by default, what is gathered can be controlled by an admin, and all stats relative to a managed kernel are obtained in a single request. I believe the current incantation addresses those points, but if you'd prefer to take this a different way, that's fine as well.

Perhaps @rgbkrk could lend some direction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-bates Its all good, I am all for having a debate over this.

Let me explain to you how I see this functionality being used, maybe then, you will see why I think that letting this function be dynamic is important.

  1. Notebook server code that calls the function.
    In this use case, there is a classic example that started this whole idea and that is the prometheus metric for memory usage of the kernel.
    Image that we have different types of metrics that we want to let the prometheus agent to collect.
    Those metrics are exposed through this function.
    If you look at how prometheus collects different metrics, you will see that it is not happening all in one place.
    different metrics are collected at different intervals, in different places of the notebook server code.
    Thus, we have to let the code in the notebook server, the flexibility, the freedom to choose what he wishes to collect.

  2. A frontend button that on a user click will return a visual representation of the kernel statistics.

This freedom of choice is given by the *stats parameter.
This is why I think that removing the *stats parameter will go against the main reason for all of this.
From what I have seen such flexibility is not wise to implement with the configuration of traitlets, which are supposed to be static as soon as you start the server.

Are we on the same page here? or is there something still unclear about my perspective?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen such flexibility is not wise to implement with the configuration of traitlets, which are supposed to be static as soon as you start the server.

As an operator, I definitely prefer config traitlets that stay the same after starting the server.

I haven't read back into this PR much, but I'd say that we definitely want jupyter_client to behave more and more like a library (accepting options as objects passed in). Only applications should use traitlets for config. cc @minrk

Copy link
Author

@Hyaxia Hyaxia Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgbkrk

I'd say that we definitely want jupyter_client to behave more and more like a library (accepting options as objects passed in). Only applications should use traitlets for config

Cool I agree.

For a second, lets say I remove the traitlet from the code.
We are left with the implmenetation of the method kernel_stats that accepts *stats as prameter, and returns the wanted gathered data types as long as it is possible.

Now, my last concern is that, it might be better to not complicate things, and just leave the memory_usage method as is, without encapsulating it inside the kernel_stats method.

But the thing is, I do see the positive side of having one method that encapsulates all of the gathering methods inside of it.
It creates a defined one way of getting the statistics that you want from the kernel, a json that you know what to expect inside.
Also it will save lines of code.
It feels robust, which is good, but it comes in place of simplicity in my opinion.
There is some hidden mechanism inside the function kernel_stats whereas in plain calls directly to the memory_usage method there aren't.

@rgbkrk , @kevin-bates , thoughts?

@kevin-bates
Copy link
Member

Thanks for your response(s), I found them helpful. The idea that jupyter_client is a library makes sense to me as well and hadn't looked at it that way previously. Also helpful is understanding that the request to ultimately trigger the gathering of a kernel's statistics comes directly from the front-end. Your initial PR, as I recall, was calling the memory_stats method every half second or so.

I still feel that a generic method for collecting the kernel-specific stats is important. As I've mentioned previously, we don't want to be calling individual kernel-specific stats methods since the call to each of those methods is expensive - particularly when we're talking about remote kernels and/or frequent intervals. A single method also makes overrides easier - in addition to the encapsulation it provides (both of which I view as positive).

What I'm still not following, is how the Notebook user conveys the value of the *stats parameter? I think it might be better to let the method return what it can. In fact, some stats may not be applicable or even possible, depending on where the kernel resides, how it was launched, etc.

I viewed this feature more as an admin thing where an admin sits at a dashboard viewing all the kernels (note that enterprise and kernel gateway configurations manage kernels on behalf of multiple users). In this case, we'd want more of a polling collection mechanism so that when the admin requests stats, that REST call simply hits the MultiKernelManager (or subclass) and it walks the stats fields of each of the active KernelManager instances. Of course, even this could vary depending on how prometheus works. Either way, gathering of stats across multiple kernels should probably be an asynchronous mechanism.

What about this approach...

  1. Keep kernel_stats() but remove the *stats parameter. The contract would be that kernel_stats() returns whatever stats it implements for the given kernel. We can document what stats are possible. Some stats, like GPU usage, won't always apply. In other cases, there may not be a way to capture such stats or that hasn't been implemnted for that launch mechansim.
  2. Remove the configurable trait from jupyter_client. kernel_stats() will return what it can - so there's no need to specify what is desired (especially since that can't always be fulfilled).
  3. In whichever module the KernelManager.kernel_stats() method is called, define a configurable option that indicates the interval at which stats are gathered (default = 0, no stats gathered). I think a good recommended interval would be something on the order of 10 seconds. If you just choose the NB-user-trigger approach, this wouldn't be required - since the act of the user's request is the "interval".
  4. Store the results in the KernelManager instance. When the NB front-end requests stats for a given kernel, it hits the MappingKernelManager for that kernel_id and returns the currently gathered stats. This is only if a polling/async collection is utilized. Otherwise, MultiKernelManager just calls the KernelManager.kernel_stats() method for the given kernel id and returns those reslults.
  5. There would be two endpoints (eventually) implemented: api/kernel/stats and api/kernel/<kernel_id>/stats. The former returns stats for all active kernels, the latter returns stats for the specified kernel. Note: the former endpoint may not be necessary since all stats should probably be fed into prometheus (somehow).

By implementing this by hitting the Multi/MappingKernelManager, I believe we keep the door open for future considerations - like an overall dashboard across all kernels.

If a polling mechanism is used to periodically gather stats across all kernels, we will need to be careful how long that takes and it's probably a good idea to implement that endpoint as an async method if possible. This is always why a longer interval is recommended.

At any rate, that's my (latest) two cents 😃 and Happy New Year! 🎉

@Hyaxia
Copy link
Author

Hyaxia commented Jan 3, 2019

@kevin-bates Thanks for the suggestion, and happy new year!

I still feel that a generic method for collecting the kernel-specific stats is important. As I've mentioned previously, we don't want to be calling individual kernel-specific stats methods since the call to each of those methods is expensive - particularly when we're talking about remote kernels and/or frequent intervals. A single method also makes overrides easier - in addition to the encapsulation it provides (both of which I view as positive).

Yeah I agree, I remember now that we have also talked about this before, but there was so much we talked about that I had already forgot some 😅

Also helpful is understanding that the request to ultimately trigger the gathering of a kernel's statistics comes directly from the front-end. Your initial PR, as I recall, was calling the memory_stats method every half second or so.

As I said, the frontend use case is only one of many others that might come, thus the function has to be as flexible as possible.

I viewed this feature more as an admin thing where an admin sits at a dashboard viewing all the kernels

I disagree with this approach.
As @rgbkrk said, this is nothing more than an additional functionality that the library of the jupyter client provides.
We should not restrict the calling of the function in any way, giving full flexibility, and maximum speed.

With all this in mine, let me perform a new commit.

@Hyaxia
Copy link
Author

Hyaxia commented Jan 3, 2019

The way its implemented atm:
There is a function called kernel_stats.
The function receives a list of parameters that represent the statistics the user of the function wishes to get.
The function will call all the necessary functions that it knows of to get the desired stats.

The function _memory_usage is used inside of kernel_stats when wanted to get the "memory" stat.

@Hyaxia
Copy link
Author

Hyaxia commented Jan 3, 2019

@rgbkrk , @minrk is there anything I should still change?
If not then I will want to reorder the commit in this PR and remove the redundant ones before merging.

@rgbkrk
Copy link
Member

rgbkrk commented Jan 3, 2019

@ivanov can you take a look here?

@Hyaxia
Copy link
Author

Hyaxia commented Jan 12, 2019

@rgbkrk @ivanov any updates? 😀

Copy link
Member

@ivanov ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really very sorry that I didn't see this earlier (and even more sorry for not making the time to push forward the earlier prototype I had for user-defined asynchronous metric). This looks good to me - you've added synchronous polling of metrics to the kernel manager at the API level, without introducing any changes to the protocol. 👍

@Hyaxia
Copy link
Author

Hyaxia commented Jan 16, 2019

@ivanov Great! I am glad to hear that it looks good.
I have a few changes to the design.
The changes will apply only to the way we add new metrics.
ATM it is designed pretty poorly because if we want to add an option to a new metric to be polled,
we have to make changes to the kernel_stats function, to the dictionary that maps the name of the stat to the function that provides it, which violates the open/closed principle.
All the rest of the design will be as it is now (I hope).

Will commit the changes in a few days from now (hopefully no more than 3 days).

@rgbkrk
Copy link
Member

rgbkrk commented Jan 17, 2019

Awesome thank you both.

@Hyaxia
Copy link
Author

Hyaxia commented Jan 18, 2019

It looks like something makes the tests fail, not something that has anything to do with the changes I have made.

@rgbkrk @ivanov It will be awesome if you could look again at the changes I have made.
long story short, I have decided that defining the polling of the metrics as a standalone class is much better than a method of the class KernelManager.
I have defined a certain standard for creating and using the kernel metric classes.
Few rules that have to be followed in order for the metric class to be exposed and polled.

For any question regarding the decisions of the design ask right away, I will answer as soon as I can.

@Hyaxia
Copy link
Author

Hyaxia commented Feb 6, 2019

@rgbkrk
@ivanov
Any updates? questions?

@rgbkrk
Copy link
Member

rgbkrk commented Feb 8, 2019

I'm assuming you've rebased already. Does the error occur on master too?

@Hyaxia
Copy link
Author

Hyaxia commented Feb 8, 2019

@rgbkrk well from what I have seen, it happens in every PR that tries to do anything.
in my checks it fails with a message pytest-cov 2.6.1 has requirement pytest>=3.6, but you'll have pytest 3.3.0 which is incompatible. in one of the python versions.

In #415, it fails with pytest-cov 2.6.1 has requirement pytest>=3.6, but you'll have pytest 3.0.7 which is incompatible. in one of the python versions.

Any addition of a new metric will not effect the `KernelManager` class
at all (see how its designed).
@Hyaxia
Copy link
Author

Hyaxia commented Feb 8, 2019

@rgbkrk btw, now I have made the rebase and its ready to be merged, unless there is anything you want to address about the design.
But the problem with the tests is still there...

@Hyaxia Hyaxia changed the title [Work-In-Progress]Add method for kernel manager to retrieve statistics [Ready]Add method for kernel manager to retrieve statistics Feb 22, 2019
@minrk
Copy link
Member

minrk commented Mar 5, 2019

The pytest issue is fixed in master if you want to rebase (sorry for asking again, but there are no conflicts at least. We can rebase for you, if you prefer). The added test fails after rebasing because psutil is not a dependency of jupyter-client yet, so this would either need to be added as a strict dependency, or this functionality made optional.

The use of ABC makes me think that these metrics are meant to be discoverable/extensible, is that true. Perhaps instead of using ABC subclasses, the listing should be discovered via entrypoints? That's how extension points are found elsewhere in Jupyter, and it's working rather well.

@Hyaxia
Copy link
Author

Hyaxia commented Mar 6, 2019

@minrk Thanks for the response.
About the first part of your reponse ->
I dont want to make psutil a strict dependency of the project because implementations of the metric classes are dynamic.
Creating the KernelMetricMemoryUsage class is just an example of a known kernel metric requirement, its not a must have in the kernelmetrics.py file and can be removed.

Now for the second part ->
The way that I see people use this is by implementing their own metric classes (subclasses of KernelMetric) in their own projects and those classes will automatically be recognized by the kernel_metrics in the kernel manager at runtime and will be able to use them, making it very dynamic and compliant with the Open Closed Principle.
I have chosen to use ABC because of its functionality to make sure that all of the sub classes have the appropriate naming and methods.
In the end, when different implementations of jupyter notebook will want to expose their unique metrics, they will create a subclass of KernelMetric and implement the needed logic in it.
Now i realize that the logic implemented in the end of the kernelmetrics.py file to make a list of the known metric classes is not valid for when a user want to extend the list in an external project.
I am thinking of creating a class that will act as a registry for these types of classes, and for every new metric class, it will be registered so the jupyter_client project will know of its existence.
Any thoughts? Better ways to implement it? Can entrypoints provide the same or better options?

@minrk
Copy link
Member

minrk commented Mar 7, 2019

I don't want to make psutil a strict dependency

that's okay, we'll just have to make sure that the metric is unavailable without an error if psutil is missing. That could be gating the declaration of the class on importing psutil, or perhaps better, allowing metrics to return or raise a special value, e.g. NotImplemented, or a special jupyter_client.metrics.NotAvailable when whatever's requires is missing. If it's not a hard requirement, it should be added as a test requirement so that the tests can run.

those classes will automatically be recognized by the kernel_metrics in the kernel manager at runtime

This isn't true in practice, though, because they aren't recognized as subclasses until the classes have been defined, which means the modules must be imported first before discovery can find them. Entrypoints are discoverable as soon as they are installed, even if they aren't imported yet. That, combined with the fact that we already are using entrypoints for this kind of thing elsewhere (JupyterHub Spawners, nbconvert Exporters, etc.) makes me think that entrypoints might be a better mechanism. What do you think?

The last issue, I think, is that KernelManagers need to define their metrics classes somehow, or capabilities, at least. A metric that works with a pid isn't going to work with a remote KernelManager like Enterprise Gateway. Automatic discovery of kernel metrics can't be done for KernelManagers in general, since the metrics implementations are adding requirements on the up-to-now private implementation details of the KernelManager.

@Hyaxia
Copy link
Author

Hyaxia commented Mar 10, 2019

@minrk I am not sure how it works in practice, could you please point me to the implementations of entry points that you mentioned? we already are using entrypoints for this kind of thing elsewhere (JupyterHub Spawners, nbconvert Exporters, etc.)
And maybe point me to the best explanation on the matter.

And a clarification:
This isn't true in practice, though, because they aren't recognized as subclasses until the classes have been defined, which means the modules must be imported first before discovery can find them
About this I have thought and come to a solution.
Its possible to create a type of registry class, that updates every time a new subclass is defined, and the kernel manager will get the sub classes from him every time.
Of course I will first check the way the entrypoints are used in the project and see if it fits better.

Thanks for the feedback!!!

Now the class KernelMetricStore acts as a store of all the metric classes
that inherit from `KernelMetric` class, so no matter from where the new
metric class is defined, the moment it locates it, the class will be added
to the `TYPES` dictionary of the KernelMetricStore class and you will be able to poll
it through the `kernel_metrics` function of the `KernelManager` class.
@Hyaxia Hyaxia changed the title [Ready]Add method for kernel manager to retrieve statistics [WIP]Add method for kernel manager to retrieve statistics Jun 1, 2019
@Hyaxia Hyaxia closed this Oct 1, 2019
@Hyaxia
Copy link
Author

Hyaxia commented Oct 1, 2019

Notifications from jupyter/notebook#3682 brought me back to this PR.
Sorry that I could not find the time to dive deeper into entry points as @minrk wanted the implementation to be.
Maybe someone can continue from where we stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants