Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

gracewehner · 2021-08-07T00:21:04Z

Relevant telegraf.conf:

[[inputs.prometheus]]
  metric_version = 2
  pod_scrape_scope = "cluster"

  ## Scrape Kubernetes pods for the following prometheus annotations:
  ## - prometheus.io/scrape: Enable scraping for this pod
  ## - prometheus.io/scheme: If the metrics endpoint is secured then you will need to
  ##     set this to `https` & most likely set the tls config.
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  ## - prometheus.io/port: If port is not 9102 use this annotation
  monitor_kubernetes_pods = true

  bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
  response_timeout = "15s"

  tls_ca = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
  insecure_skip_verify = true

System info:

Telegraf 1.18.3 - 1.19.2
Telegraf executable run as a process in a Kubernetes container

Steps to reproduce:

Deploy a pod that exposes prometheus metrics at <pod_ip>:<port>/<metrics path> with the annotations in the comment for monitor_kubernetes_pods above.
Run telegraf with the prometheus input plugin and the settings monitor_kubernetes_pods = true, pod_scrape_scope = "cluster".

Expected behavior:

Telegraf will discover the pods with that have the annotations and scrape the metrics exposed by those pods.

Actual behavior:

No pods are registered in the kubernetes.go code of the prometheus input plugin. Since no pods are discovered/registered, no prometheus metrics are scraped.

Additional info:

This is different from the issues #9349 and #9408 which is for pod_scrape_scope = "node".

Looks like the issue stems from #8937 and this line in kubernetes.go where the pod struct is never populated with the event object so the pod registered will not have info about the endpoint to scrape.
Replacing that line with something like:

pod, ok := event.Object.(*corev1.Pod)
if !ok {
   return fmt.Errorf("Unexpected object when getting pods")
}

fixes the issue by getting the pod from the watch event.

I have a fork and can make a PR with this change.

The text was updated successfully, but these errors were encountered:

gracewehner · 2021-08-09T18:42:39Z

Created the PR #9605

gracewehner added the bug unexpected problem or unintended behavior label Aug 7, 2021

telegraf-tiger bot added the area/discovery label Aug 7, 2021

This was referenced Aug 9, 2021

Fix out_oms.go dependency vulnerabilities microsoft/Docker-Provider#623

Merged

fix: issues with prometheus kubernetes pod discovery #9605

Merged

gracewehner mentioned this issue Aug 9, 2021

Migrate from github.com/ericchiang/k8s to github.com/kubernetes/client-go #8937

Merged

reimda closed this as completed in #9605 Aug 17, 2021

Pranav-Balakumar mentioned this issue Nov 23, 2021

Scaled pods not detected by telegraf in Kubernetes cluster #10148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

gracewehner commented Aug 7, 2021

gracewehner commented Aug 9, 2021

Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

Comments

gracewehner commented Aug 7, 2021

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

gracewehner commented Aug 9, 2021