Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there or will there ever be metric for pods in terminating state? #348

Closed
cullepl opened this issue Jan 17, 2018 · 11 comments
Closed

Is there or will there ever be metric for pods in terminating state? #348

cullepl opened this issue Jan 17, 2018 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@cullepl
Copy link

cullepl commented Jan 17, 2018

I see there are metrics for if a pod is terminated kube_pod_container_status_terminated however, we sometimes observe pods in terminating state.

I was wondering if there is a way via kube-state-metrics to find this state? I couldnt see anything and I'm running v1.2.0

All I see in our prometheus are these:

image

If pods are stuck in terminating state then we usually have to take actions against the hosts they are running on.
We'd ideally like a solution via prometheus and alertmanager using kube-state-metrics metrics to alert us rather than having to create something home grown.

@brancz
Copy link
Member

brancz commented Jan 17, 2018

How are you determining this today?

If kube-state-metrics can find what you are looking for in a Pod object then yes, that's something we could add a metric for.

@cullepl
Copy link
Author

cullepl commented Jan 18, 2018

If I query pods using kubectl get pods -n my-namespace today, I see this:

my-namespace  my-pod-1   6/6       Terminating   1          11d       <IP>   1.2.3.4
my-namespace   my-pod-2   1/6       Terminating   22         10d       <none>           1.2.3.5

If I describe one of the PODs, the status is

Status:				Terminating (expires Sat, 13 Jan 2018 16:35:28 +0000)

@brancz
Copy link
Member

brancz commented Jan 18, 2018

When you view the yaml of this pod, can you see it there as well, or is this only available through events?

@cullepl
Copy link
Author

cullepl commented Jan 18, 2018

Looks like it's only through describing the pod.
If i do kubectl get pod -n <ns> <pod-name> -o yaml there is no reference to terminating. At the end of the yaml output there is

  phase: Running
  podIP: 1.2.3.4
  startTime: 2018-01-06T06:21:34Z

@brancz
Copy link
Member

brancz commented Jan 19, 2018

Yes that's what I thought. The tricky thing here is that we can only extract this information from Kubernetes Events. The problem with events is their cardinality, it is most likely going to explode. Something we could do is aggregate events and expose lower cardinality aggregations. This would definitely be a very different pattern from all the other collectors today, where the objects we create metrics from are in memory. With events we would need to listen to events and garbage collect them if they're older than some period.

We could try this as an experiment, but my general feeling is that this is going to be somewhat unstable data.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@naveenb29
Copy link

/reopen /remove-lifecycle rotten

@nucl3arj4zz
Copy link

Will this issue be re-opened? /reopen

@mxinden
Copy link
Contributor

mxinden commented Feb 21, 2019

Given the fact that this information is not extractable from a pod manifest (see comment), this can not be achieved with the current architecture of kube-state-metrics.

I am not sure we should introduce event aggregation here, just to support this use case. What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants