Is there or will there ever be metric for pods in terminating state? #348

cullepl · 2018-01-17T16:23:50Z

I see there are metrics for if a pod is terminated kube_pod_container_status_terminated however, we sometimes observe pods in terminating state.

I was wondering if there is a way via kube-state-metrics to find this state? I couldnt see anything and I'm running v1.2.0

All I see in our prometheus are these:

If pods are stuck in terminating state then we usually have to take actions against the hosts they are running on.
We'd ideally like a solution via prometheus and alertmanager using kube-state-metrics metrics to alert us rather than having to create something home grown.

The text was updated successfully, but these errors were encountered:

brancz · 2018-01-17T19:37:51Z

How are you determining this today?

If kube-state-metrics can find what you are looking for in a Pod object then yes, that's something we could add a metric for.

cullepl · 2018-01-18T08:53:58Z

If I query pods using kubectl get pods -n my-namespace today, I see this:

my-namespace  my-pod-1   6/6       Terminating   1          11d       <IP>   1.2.3.4
my-namespace   my-pod-2   1/6       Terminating   22         10d       <none>           1.2.3.5

If I describe one of the PODs, the status is

Status:				Terminating (expires Sat, 13 Jan 2018 16:35:28 +0000)

brancz · 2018-01-18T09:41:53Z

When you view the yaml of this pod, can you see it there as well, or is this only available through events?

cullepl · 2018-01-18T13:49:54Z

Looks like it's only through describing the pod.
If i do kubectl get pod -n <ns> <pod-name> -o yaml there is no reference to terminating. At the end of the yaml output there is

  phase: Running
  podIP: 1.2.3.4
  startTime: 2018-01-06T06:21:34Z

brancz · 2018-01-19T11:21:47Z

Yes that's what I thought. The tricky thing here is that we can only extract this information from Kubernetes Events. The problem with events is their cardinality, it is most likely going to explode. Something we could do is aggregate events and expose lower cardinality aggregations. This would definitely be a very different pattern from all the other collectors today, where the objects we create metrics from are in memory. With events we would need to listen to events and garbage collect them if they're older than some period.

We could try this as an experiment, but my general feeling is that this is going to be somewhat unstable data.

fejta-bot · 2018-04-19T11:47:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-19T12:35:10Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-06-18T13:21:56Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

naveenb29 · 2019-02-05T22:32:45Z

/reopen /remove-lifecycle rotten

nucl3arj4zz · 2019-02-21T12:05:43Z

Will this issue be re-opened? /reopen

mxinden · 2019-02-21T12:57:27Z

Given the fact that this information is not extractable from a pod manifest (see comment), this can not be achieved with the current architecture of kube-state-metrics.

I am not sure we should introduce event aggregation here, just to support this use case. What are your thoughts?

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2018

k8s-ci-robot closed this as completed Jun 18, 2018

jinnovation mentioned this issue Jan 10, 2020

Add "Terminating" status in kube_pod_status_phase metrics #1013

Merged

fpetkovski mentioned this issue Sep 13, 2021

Implement a mechanism for opt-in metrics #1574

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there or will there ever be metric for pods in terminating state? #348

Is there or will there ever be metric for pods in terminating state? #348

cullepl commented Jan 17, 2018

brancz commented Jan 17, 2018

cullepl commented Jan 18, 2018

brancz commented Jan 18, 2018

cullepl commented Jan 18, 2018

brancz commented Jan 19, 2018

fejta-bot commented Apr 19, 2018

fejta-bot commented May 19, 2018

fejta-bot commented Jun 18, 2018

naveenb29 commented Feb 5, 2019

nucl3arj4zz commented Feb 21, 2019

mxinden commented Feb 21, 2019

Is there or will there ever be metric for pods in terminating state? #348

Is there or will there ever be metric for pods in terminating state? #348

Comments

cullepl commented Jan 17, 2018

brancz commented Jan 17, 2018

cullepl commented Jan 18, 2018

brancz commented Jan 18, 2018

cullepl commented Jan 18, 2018

brancz commented Jan 19, 2018

fejta-bot commented Apr 19, 2018

fejta-bot commented May 19, 2018

fejta-bot commented Jun 18, 2018

naveenb29 commented Feb 5, 2019

nucl3arj4zz commented Feb 21, 2019

mxinden commented Feb 21, 2019