Add Support For Prometheus Metrics #348

seanmalloy · 2020-07-15T06:18:01Z

Is your feature request related to a problem? Please describe.

I'd like to monitor the descheduler using Prometheus metrics. Metrics should only be collected when run without the --dry-run CLI option.

Describe the solution you'd like

Add an optional configuration option(CLI option?) to enable exposing Prometheus metrics. At this point in time I have not given much thought to exactly which metrics and labels the descheduler would expose.

Describe alternatives you've considered

I have considered using the descheduler logs with log aggregation software for monitoring the descheduler. I also considered using the k8s events generated by the descheduler in combination with eventrouter. The eventrouter does provide some simple Prometheus metrics for events that it collects.

What version of descheduler are you using?

descheduler version: v0.18.0

Additional context

Currently the descheduler is run as a Job or CronJob this makes it difficult for Prometheus to scrape metrics from the descheduler pod because it is not long lived. One option is to have descheduler push metrics using the Prometheus pushgateway.

The text was updated successfully, but these errors were encountered:

damemi · 2020-07-27T15:32:25Z

Currently the descheduler is run as a Job or CronJob this makes it difficult for Prometheus to scrape metrics from the descheduler pod because it is not long lived

The scheduler can run as a long lived pod with the --descheduling-interval flag. This is actually what we chose to prefer in the OpenShift Descheduler Operator because it also allowed us to not yet depend on CronJobs (which are still beta)

farah · 2020-08-31T04:31:06Z

/assign

ingvagabund · 2020-09-02T13:22:00Z

@seanmalloy any particular metrics in mind? I can quickly think of:

number of pods evicted per each strategy
number of pods failed to be evicted per each strategy

seanmalloy · 2020-09-02T14:50:13Z

@seanmalloy any particular metrics in mind? I can quickly think of:

number of pods evicted per each strategy

number of pods failed to be evicted per each strategy

@ingvagabund yep that is pretty much it. See below for some additional details. I'm open to suggestions.

Metrics

name	type	description
build_info	gauge	constant 1
pods_evicted_success	counter	total number of pods successfully evicted
pods_evicted_failed	counter	total number of pods failed eviction

The metric names can be changed. There is probably some k8s best practice for naming Prometheus metrics. We should follow those guidelines to determine the names of the metrics.

Labels

The pods_evicted_success and pods_evicted_failed metrics should have these labels.

namespace
descheduler strategy name

The build_info metric should have these labels.

Go version
Descheduler version
Git SHA1
Git Branch

seanmalloy · 2020-09-02T15:15:05Z

One thing I forgot...

We need a separate metric or maybe a label to differentiate when using --dry-run CLI option and not using the --dry-run CLI option. I'd like to be able to collect metrics and see what descheduler would do without evicting any pods.

ingvagabund · 2020-09-03T09:01:47Z

With --dry-run set you will be evicting the same pods all over again. Will the metrics be still useful for you (e.g. with the same pod counted 1000 times)

seanmalloy · 2020-09-04T13:19:47Z

With --dry-run set you will be evicting the same pods all over again. Will the metrics be still useful for you (e.g. with the same pod counted 1000 times)

@ingvagabund after thinking about it I believe you are correct. I edited the original description to remove the requirement of collecting metrics when the --dry-run CLI option is used.

seanmalloy · 2020-09-26T04:08:58Z

@farah I see that this issue is currently assigned to you. I'm just checking in to see if you have any questions or need any assistance. We are trying to complete this feature as part of the descheduler v0.20.0 release which should be sometime in December.

Thanks!

eatwithforks · 2020-12-16T17:52:52Z

👍 for this feature. @seanmalloy I see v0.20.0 was released 6 days ago but didn't see mentions of prometheus metrics collection in the changelog. Is this feature still in the works and do you have an estimate on ETA?

seanmalloy · 2020-12-16T18:10:35Z

👍 for this feature. @seanmalloy I see v0.20.0 was released 6 days ago but didn't see mentions of prometheus metrics collection in the changelog. Is this feature still in the works and do you have an estimate on ETA?

This feature was not implemented for v0.20.0. Maybe for the v0.21.0 release. For sure I'm hoping sometime in 2021.

damemi · 2020-12-16T22:31:31Z

Is there currently any PRs open for this? The last activity was a couple months ago so it may be safe to assume this is open for anyone interested in contributing!

seanmalloy · 2020-12-17T17:21:14Z

Is there currently any PRs open for this? The last activity was a couple months ago so it may be safe to assume this is open for anyone interested in contributing!

Correct. Basically we need someone to volunteer to write the code to implement this feature. It's on my extended todo list, but I don't have time to do this right now. So not assigning myself at the moment.

/help

k8s-ci-robot · 2020-12-17T17:21:15Z

@seanmalloy:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Is there currently any PRs open for this? The last activity was a couple months ago so it may be safe to assume this is open for anyone interested in contributing!

Correct. Basically we need someone to volunteer to write the code to implement this feature. It's on my extended todo list, but I don't have time to do this right now. So not assigning myself at the moment.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

damemi · 2021-02-19T21:54:04Z

A good first step for implementation would be adding some of these metrics to podEvictor (#503). I'll have some time to work on this, unless someone else is interested

ingvagabund · 2021-02-22T23:29:33Z

WIP PR for discussion: #505

seanmalloy · 2021-03-07T05:43:12Z

This feature will ship with descheduler v0.21.0.

seanmalloy added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 15, 2020

k8s-ci-robot assigned farah Aug 31, 2020

seanmalloy mentioned this issue Sep 15, 2020

Kubernetes 1.20 Release Cycle #400

Closed

k8s-ci-robot unassigned farah Dec 17, 2020

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 17, 2020

damemi mentioned this issue Feb 19, 2021

Improve podEvictor statistics #503

Closed

ingvagabund mentioned this issue Feb 22, 2021

Collect metrics #505

Merged

k8s-ci-robot closed this as completed in #505 Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support For Prometheus Metrics #348

Add Support For Prometheus Metrics #348

seanmalloy commented Jul 15, 2020 •

edited

Loading

damemi commented Jul 27, 2020

farah commented Aug 31, 2020

ingvagabund commented Sep 2, 2020

seanmalloy commented Sep 2, 2020

seanmalloy commented Sep 2, 2020

ingvagabund commented Sep 3, 2020

seanmalloy commented Sep 4, 2020

seanmalloy commented Sep 26, 2020

eatwithforks commented Dec 16, 2020

seanmalloy commented Dec 16, 2020

damemi commented Dec 16, 2020

seanmalloy commented Dec 17, 2020

k8s-ci-robot commented Dec 17, 2020

damemi commented Feb 19, 2021

ingvagabund commented Feb 22, 2021

seanmalloy commented Mar 7, 2021

Add Support For Prometheus Metrics #348

Add Support For Prometheus Metrics #348

Comments

seanmalloy commented Jul 15, 2020 • edited Loading

damemi commented Jul 27, 2020

farah commented Aug 31, 2020

ingvagabund commented Sep 2, 2020

seanmalloy commented Sep 2, 2020

Metrics

Labels

seanmalloy commented Sep 2, 2020

ingvagabund commented Sep 3, 2020

seanmalloy commented Sep 4, 2020

seanmalloy commented Sep 26, 2020

eatwithforks commented Dec 16, 2020

seanmalloy commented Dec 16, 2020

damemi commented Dec 16, 2020

seanmalloy commented Dec 17, 2020

k8s-ci-robot commented Dec 17, 2020

damemi commented Feb 19, 2021

ingvagabund commented Feb 22, 2021

seanmalloy commented Mar 7, 2021

seanmalloy commented Jul 15, 2020 •

edited

Loading