Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inputs.prometheus] Error when parsing metrics from kube-state-metrics version v2.8.1 #12749

Closed
tantm3 opened this issue Feb 27, 2023 · 19 comments
Labels
bug unexpected problem or unintended behavior

Comments

@tantm3
Copy link

tantm3 commented Feb 27, 2023

Relevant telegraf.conf

[[inputs.prometheus]]
      ## An array of urls to scrape metrics from.
      # urls = ["http://localhost:9100/metrics"]

      ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
      ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
      ## Valid options: 1, 2
      # metric_version = 1

      ## Url tag name (tag containing scrapped url. optional, default is "url")
      # url_tag = "url"

      ## Whether the timestamp of the scraped metrics will be ignored.
      ## If set to true, the gather time will be used.
      # ignore_timestamp = false

      ## An array of Kubernetes services to scrape metrics from.
      kubernetes_services = ["http://kube-state-metrics.kube-system:8080/metrics"]

      ## Kubernetes config file to create client from.
      # kube_config = "/path/to/kubernetes.config"

      ## Scrape Pods
      ## Enable scraping of k8s pods. Further settings as to which pods to scape
      ## are determiend by the 'method' option below. When enabled, the default is
      ## to use annotations to determine whether to scrape or not.
      # monitor_kubernetes_pods = false

      ## Scrape Pods Method
      ## annotations: default, looks for specific pod annotations documented below
      ## settings: only look for pods matching the settings provided, not
      ##   annotations
      ## settings+annotations: looks at pods that match annotations using the user
      ##   defined settings
      # monitor_kubernetes_pods_method = "annotations"

      ## Scrape Pods 'annotations' method options
      ## If set method is set to 'annotations' or 'settings+annotations', these
      ## annotation flags are looked for:
      ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
      ##     use 'prometheus.io/scrape=false' annotation to opt-out entirely.
      ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
      ##     need to set this to 'https' & most likely set the tls config
      ## - prometheus.io/path: If the metrics path is not /metrics, define it with
      ##     this annotation
      ## - prometheus.io/port: If port is not 9102 use this annotation

      ## Scrape Pods 'settings' method options
      ## When using 'settings' or 'settings+annotations', the default values for
      ## annotations can be modified using with the following options:
      # monitor_kubernetes_pods_scheme = "http"
      # monitor_kubernetes_pods_port = "9102"
      # monitor_kubernetes_pods_path = "/metrics"

      ## Get the list of pods to scrape with either the scope of
      ## - cluster: the kubernetes watch api (default, no need to specify)
      ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
      # pod_scrape_scope = "cluster"

      ## Only for node scrape scope: node IP of the node that telegraf is running on.
      ## Either this config or the environment variable NODE_IP must be set.
      # node_ip = "10.180.1.1"

      ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
      ## Default is 60 seconds.
      # pod_scrape_interval = 60

      ## Restricts Kubernetes monitoring to a single namespace
      ##   ex: monitor_kubernetes_pods_namespace = "default"
      # monitor_kubernetes_pods_namespace = ""
      ## The name of the label for the pod that is being scraped.
      ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
      # pod_namespace_label_name = "namespace"
      # label selector to target pods which have the label
      # kubernetes_label_selector = "env=dev,app=nginx"
      # field selector to target pods
      # eg. To scrape pods on a specific node
      # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"

      # cache refresh interval to set the interval for re-sync of pods list.
      # Default is 60 minutes.
      # cache_refresh_interval = 60

      ## Scrape Services available in Consul Catalog
      # [inputs.prometheus.consul]
      #   enabled = true
      #   agent = "http://localhost:8500"
      #   query_interval = "5m"

      #   [[inputs.prometheus.consul.query]]
      #     name = "a service name"
      #     tag = "a service tag"
      #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
      #     [inputs.prometheus.consul.query.tags]
      #       host = "{{.Node}}"

      ## Use bearer token for authorization. ('bearer_token' takes priority)
      # bearer_token = "/path/to/bearer/token"
      ## OR
      # bearer_token_string = "abc_123"

      ## HTTP Basic Authentication username and password. ('bearer_token' and
      ## 'bearer_token_string' take priority)
      # username = ""
      # password = ""

      ## Optional custom HTTP headers
      # http_headers = {"X-Special-Header" = "Special-Value"}

      ## Specify timeout duration for slower prometheus clients (default is 3s)
      # timeout = "3s"
      
      ##   deprecated in 1.26; use the timeout option
      # response_timeout = "3s"
      
      ## HTTP Proxy support
      # use_system_proxy = false
      # http_proxy_url = ""

      ## Optional TLS Config
      # tls_ca = /path/to/cafile
      # tls_cert = /path/to/certfile
      # tls_key = /path/to/keyfile

      ## Use TLS but skip chain & host verification
      # insecure_skip_verify = false

Logs from Telegraf

2023-02-27T15:10:30Z E! [inputs.prometheus] Error in plugin: error reading metrics for http://10.1.181.230:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data

System info

Telegraf 1.23.0

Docker

No response

Steps to reproduce

  1. Deploy kube-state-metrics image version 2.8.1 (registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.1)
  2. From telegraf config inputs prometheus to script target "kubernetes_services = ["http://kube-state-metrics.kube-system:8080/metrics"]"
  3. Change kube-state-metrics image version to 2.7.0, the input prometheus work normally
    ...

Expected behavior

The input prometheus will scrape metrics normally but only when using kube-state-metrics image version 2.7.0

Actual behavior

Using kube-state-metrics image version 2.8.1, it has and error
[inputs.prometheus] Error in plugin: error reading metrics for http://10.1.181.230:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data``

Additional info

No response

@tantm3 tantm3 added the bug unexpected problem or unintended behavior label Feb 27, 2023
@powersj
Copy link
Contributor

powersj commented Feb 27, 2023

http://10.1.181.230:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data``

So this means that when it grabbed that URL it failed to parse it. Can you confirm that the URL provides valid data in the prometheus format to parse?

@powersj powersj added the waiting for response waiting for response from contributor label Feb 27, 2023
@tantm3
Copy link
Author

tantm3 commented Feb 28, 2023

http://10.1.181.230:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data``

So this means that when it grabbed that URL it failed to parse it. Can you confirm that the URL provides valid data in the prometheus format to parse?

Both kube-state-metrics versions have valid data, I attach one part of the output
I can not find the data that cause an error when parsing.
Is this because the metric and label have too many characters?

This is the response from kube-state-metrics version 2.7.0:

# HELP kube_certificatesigningrequest_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_annotations gauge
kube_certificatesigningrequest_annotations{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1
# HELP kube_certificatesigningrequest_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
kube_certificatesigningrequest_labels{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1
# HELP kube_certificatesigningrequest_created [STABLE] Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
kube_certificatesigningrequest_created{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1.677472237e+09
# HELP kube_certificatesigningrequest_condition [STABLE] The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
kube_certificatesigningrequest_condition{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client",condition="approved"} 1
kube_certificatesigningrequest_condition{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client",condition="denied"} 0
# HELP kube_certificatesigningrequest_cert_length [STABLE] Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
kube_certificatesigningrequest_cert_length{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 0
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
kube_configmap_annotations{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_annotations{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="default",configmap="telegraf"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns"} 1
kube_configmap_annotations{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_annotations{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_annotations{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_annotations{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_annotations{namespace="default",configmap="datadogtoken"} 1
# HELP kube_configmap_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_configmap_labels gauge
kube_configmap_labels{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_labels{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_labels{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="default",configmap="telegraf"} 1
kube_configmap_labels{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
kube_configmap_labels{namespace="kube-system",configmap="coredns"} 1
kube_configmap_labels{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_labels{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_labels{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_labels{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_labels{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_labels{namespace="default",configmap="datadogtoken"} 1
# HELP kube_configmap_info [STABLE] Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_info{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_info{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_info{namespace="default",configmap="datadogtoken"} 1
kube_configmap_info{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_info{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_info{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="telegraf"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
# HELP kube_configmap_created [STABLE] Unix creation timestamp
# TYPE kube_configmap_created gauge
kube_configmap_created{namespace="default",configmap="datadog-operator-lock"} 1.677208931e+09
kube_configmap_created{namespace="default",configmap="datadog-cluster-id"} 1.677209064e+09
kube_configmap_created{namespace="default",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="default",configmap="telegraf"} 1.677232761e+09
kube_configmap_created{namespace="kube-system",configmap="kube-dns-autoscaler"} 1.677208419e+09
kube_configmap_created{namespace="kube-system",configmap="coredns"} 1.67720823e+09
kube_configmap_created{namespace="kube-system",configmap="extension-apiserver-authentication"} 1.677208216e+09
kube_configmap_created{namespace="kube-system",configmap="calico-config"} 1.67720823e+09
kube_configmap_created{namespace="kube-public",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="default",configmap="datadog-custom-metrics"} 1.677224619e+09
kube_configmap_created{namespace="default",configmap="datadog-leader-election"} 1.677209064e+09
kube_configmap_created{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1.677208232e+09
kube_configmap_created{namespace="default",configmap="datadogtoken"} 1.677224695e+09
# HELP kube_configmap_metadata_resource_version Resource version representing a specific version of the configmap.
# TYPE kube_configmap_metadata_resource_version gauge
kube_configmap_metadata_resource_version{namespace="default",configmap="telegraf"} 730464
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-operator-lock"} 40469
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-cluster-id"} 2973
kube_configmap_metadata_resource_version{namespace="default",configmap="kube-root-ca.crt"} 273
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="calico-config"} 356
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kube-dns-autoscaler"} 1281
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="coredns"} 306
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="extension-apiserver-authentication"} 30
kube_configmap_metadata_resource_version{namespace="kube-public",configmap="kube-root-ca.crt"} 267
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-custom-metrics"} 42931
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-leader-election"} 822870
kube_configmap_metadata_resource_version{namespace="default",configmap="datadogtoken"} 822260
kube_configmap_metadata_resource_version{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 269
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kube-root-ca.crt"} 279
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 452
# HELP kube_cronjob_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_cronjob_annotations gauge
# HELP kube_cronjob_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_cronjob_labels gauge
# HELP kube_cronjob_info [STABLE] Info about cronjob.
# TYPE kube_cronjob_info gauge
# HELP kube_cronjob_created [STABLE] Unix creation timestamp
# TYPE kube_cronjob_created gauge
# HELP kube_cronjob_status_active [STABLE] Active holds pointers to currently running jobs.
# TYPE kube_cronjob_status_active gauge
# HELP kube_cronjob_status_last_schedule_time [STABLE] LastScheduleTime keeps information of when was the last time the job was successfully scheduled.
# TYPE kube_cronjob_status_last_schedule_time gauge
# HELP kube_cronjob_status_last_successful_time LastSuccessfulTime keeps information of when was the last time the job was completed successfully.
# TYPE kube_cronjob_status_last_successful_time gauge
# HELP kube_cronjob_spec_suspend [STABLE] Suspend flag tells the controller to suspend subsequent executions.
# TYPE kube_cronjob_spec_suspend gauge
# HELP kube_cronjob_spec_starting_deadline_seconds [STABLE] Deadline in seconds for starting the job if it misses scheduled time for any reason.
# TYPE kube_cronjob_spec_starting_deadline_seconds gauge
# HELP kube_cronjob_next_schedule_time [STABLE] Next time the cronjob should be scheduled. The time after lastScheduleTime, or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed.
# TYPE kube_cronjob_next_schedule_time gauge
# HELP kube_cronjob_metadata_resource_version [STABLE] Resource version representing a specific version of the cronjob.
# TYPE kube_cronjob_metadata_resource_version gauge
# HELP kube_cronjob_spec_successful_job_history_limit Successful job history limit tells the controller how many completed jobs should be preserved.
# TYPE kube_cronjob_spec_successful_job_history_limit gauge
# HELP kube_cronjob_spec_failed_job_history_limit Failed job history limit tells the controller how many failed jobs should be preserved.
# TYPE kube_cronjob_spec_failed_job_history_limit gauge
# HELP kube_daemonset_created [STABLE] Unix creation timestamp
# TYPE kube_daemonset_created gauge
kube_daemonset_created{namespace="kube-system",daemonset="calico-node"} 1.67720823e+09
kube_daemonset_created{namespace="kube-system",daemonset="npd"} 1.677208232e+09
kube_daemonset_created{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1.677208233e+09
kube_daemonset_created{namespace="default",daemonset="datadog-agent"} 1.677225621e+09
kube_daemonset_created{namespace="default",daemonset="telegraf"} 1.67731061e+09
kube_daemonset_created{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1.677208229e+09
# HELP kube_daemonset_status_current_number_scheduled [STABLE] The number of nodes running at least one daemon pod and are supposed to.
# TYPE kube_daemonset_status_current_number_scheduled gauge
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_current_number_scheduled{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_current_number_scheduled{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
# HELP kube_daemonset_status_desired_number_scheduled [STABLE] The number of nodes that should be running the daemon pod.
# TYPE kube_daemonset_status_desired_number_scheduled gauge
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_desired_number_scheduled{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_desired_number_scheduled{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
# HELP kube_daemonset_status_number_available [STABLE] The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available
# TYPE kube_daemonset_status_number_available gauge
kube_daemonset_status_number_available{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_number_available{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_number_available{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_number_available{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_number_available{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_number_available{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
# HELP kube_daemonset_status_number_misscheduled [STABLE] The number of nodes running a daemon pod but are not supposed to.
# TYPE kube_daemonset_status_number_misscheduled gauge
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="calico-node"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="npd"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 0
kube_daemonset_status_number_misscheduled{namespace="default",daemonset="datadog-agent"} 0
kube_daemonset_status_number_misscheduled{namespace="default",daemonset="telegraf"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 0
# HELP kube_daemonset_status_number_ready [STABLE] The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.
# TYPE kube_daemonset_status_number_ready gauge

And this is from version 2.8.1:

# HELP kube_certificatesigningrequest_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_annotations gauge
kube_certificatesigningrequest_annotations{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1
# HELP kube_certificatesigningrequest_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
kube_certificatesigningrequest_labels{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1
# HELP kube_certificatesigningrequest_created [STABLE] Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
kube_certificatesigningrequest_created{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 1.677472237e+09
# HELP kube_certificatesigningrequest_condition [STABLE] The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
kube_certificatesigningrequest_condition{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client",condition="approved"} 1
kube_certificatesigningrequest_condition{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client",condition="denied"} 0
# HELP kube_certificatesigningrequest_cert_length [STABLE] Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
kube_certificatesigningrequest_cert_length{certificatesigningrequest="myuser",signer_name="kubernetes.io/kube-apiserver-client"} 0
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
kube_configmap_annotations{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_annotations{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns"} 1
kube_configmap_annotations{namespace="default",configmap="telegraf"} 1
kube_configmap_annotations{namespace="default",configmap="datadogtoken"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_annotations{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_annotations{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_annotations{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_annotations{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_annotations{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
# HELP kube_configmap_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_configmap_labels gauge
kube_configmap_labels{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
kube_configmap_labels{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_labels{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_labels{namespace="default",configmap="telegraf"} 1
kube_configmap_labels{namespace="default",configmap="datadogtoken"} 1
kube_configmap_labels{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_labels{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_labels{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_labels{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="kube-system",configmap="coredns"} 1
kube_configmap_labels{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_labels{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_labels{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_labels{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
# HELP kube_configmap_info [STABLE] Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_info{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="datadog-cluster-id"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-dns-autoscaler"} 1
kube_configmap_info{namespace="default",configmap="datadog-operator-lock"} 1
kube_configmap_info{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
kube_configmap_info{namespace="default",configmap="telegraf"} 1
kube_configmap_info{namespace="default",configmap="datadogtoken"} 1
kube_configmap_info{namespace="default",configmap="datadog-leader-election"} 1
kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_info{namespace="default",configmap="datadog-custom-metrics"} 1
kube_configmap_info{namespace="kube-public",configmap="kube-root-ca.crt"} 1
# HELP kube_configmap_created [STABLE] Unix creation timestamp
# TYPE kube_configmap_created gauge
kube_configmap_created{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="kube-dns-autoscaler"} 1.677208419e+09
kube_configmap_created{namespace="default",configmap="datadog-operator-lock"} 1.677208931e+09
kube_configmap_created{namespace="default",configmap="datadog-cluster-id"} 1.677209064e+09
kube_configmap_created{namespace="default",configmap="telegraf"} 1.677232761e+09
kube_configmap_created{namespace="default",configmap="datadogtoken"} 1.677224695e+09
kube_configmap_created{namespace="default",configmap="datadog-leader-election"} 1.677209064e+09
kube_configmap_created{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1.677208232e+09
kube_configmap_created{namespace="default",configmap="datadog-custom-metrics"} 1.677224619e+09
kube_configmap_created{namespace="default",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="coredns"} 1.67720823e+09
kube_configmap_created{namespace="kube-public",configmap="kube-root-ca.crt"} 1.677208229e+09
kube_configmap_created{namespace="kube-system",configmap="extension-apiserver-authentication"} 1.677208216e+09
kube_configmap_created{namespace="kube-system",configmap="calico-config"} 1.67720823e+09
# HELP kube_configmap_metadata_resource_version Resource version representing a specific version of the configmap.
# TYPE kube_configmap_metadata_resource_version gauge
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-operator-lock"} 40469
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-cluster-id"} 2973
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kube-dns-autoscaler"} 1281
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-custom-metrics"} 42931
kube_configmap_metadata_resource_version{namespace="default",configmap="kube-root-ca.crt"} 273
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kube-root-ca.crt"} 279
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="coredns"} 306
kube_configmap_metadata_resource_version{namespace="default",configmap="telegraf"} 730464
kube_configmap_metadata_resource_version{namespace="default",configmap="datadogtoken"} 823242
kube_configmap_metadata_resource_version{namespace="default",configmap="datadog-leader-election"} 823275
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 452
kube_configmap_metadata_resource_version{namespace="kube-public",configmap="kube-root-ca.crt"} 267
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="calico-config"} 356
kube_configmap_metadata_resource_version{namespace="kube-system",configmap="extension-apiserver-authentication"} 30
kube_configmap_metadata_resource_version{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 269
# HELP kube_cronjob_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_cronjob_annotations gauge
# HELP kube_cronjob_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_cronjob_labels gauge
# HELP kube_cronjob_info [STABLE] Info about cronjob.
# TYPE kube_cronjob_info gauge
# HELP kube_cronjob_created [STABLE] Unix creation timestamp
# TYPE kube_cronjob_created gauge
# HELP kube_cronjob_status_active [STABLE] Active holds pointers to currently running jobs.
# TYPE kube_cronjob_status_active gauge
# HELP kube_cronjob_status_last_schedule_time [STABLE] LastScheduleTime keeps information of when was the last time the job was successfully scheduled.
# TYPE kube_cronjob_status_last_schedule_time gauge
# HELP kube_cronjob_status_last_successful_time LastSuccessfulTime keeps information of when was the last time the job was completed successfully.
# TYPE kube_cronjob_status_last_successful_time gauge
# HELP kube_cronjob_spec_suspend [STABLE] Suspend flag tells the controller to suspend subsequent executions.
# TYPE kube_cronjob_spec_suspend gauge
# HELP kube_cronjob_spec_starting_deadline_seconds [STABLE] Deadline in seconds for starting the job if it misses scheduled time for any reason.
# TYPE kube_cronjob_spec_starting_deadline_seconds gauge
# HELP kube_cronjob_next_schedule_time [STABLE] Next time the cronjob should be scheduled. The time after lastScheduleTime, or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed.
# TYPE kube_cronjob_next_schedule_time gauge
# HELP kube_cronjob_metadata_resource_version [STABLE] Resource version representing a specific version of the cronjob.
# TYPE kube_cronjob_metadata_resource_version gauge
# HELP kube_cronjob_spec_successful_job_history_limit Successful job history limit tells the controller how many completed jobs should be preserved.
# TYPE kube_cronjob_spec_successful_job_history_limit gauge
# HELP kube_cronjob_spec_failed_job_history_limit Failed job history limit tells the controller how many failed jobs should be preserved.
# TYPE kube_cronjob_spec_failed_job_history_limit gauge
# HELP kube_daemonset_created [STABLE] Unix creation timestamp
# TYPE kube_daemonset_created gauge
kube_daemonset_created{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1.677208229e+09
kube_daemonset_created{namespace="kube-system",daemonset="calico-node"} 1.67720823e+09
kube_daemonset_created{namespace="kube-system",daemonset="npd"} 1.677208232e+09
kube_daemonset_created{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1.677208233e+09
kube_daemonset_created{namespace="default",daemonset="datadog-agent"} 1.677225621e+09
kube_daemonset_created{namespace="default",daemonset="telegraf"} 1.67731061e+09
# HELP kube_daemonset_status_current_number_scheduled [STABLE] The number of nodes running at least one daemon pod and are supposed to.
# TYPE kube_daemonset_status_current_number_scheduled gauge
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_current_number_scheduled{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_current_number_scheduled{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_current_number_scheduled{namespace="kube-system",daemonset="npd"} 2
# HELP kube_daemonset_status_desired_number_scheduled [STABLE] The number of nodes that should be running the daemon pod.
# TYPE kube_daemonset_status_desired_number_scheduled gauge
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_desired_number_scheduled{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_desired_number_scheduled{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_desired_number_scheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
# HELP kube_daemonset_status_number_available [STABLE] The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available
# TYPE kube_daemonset_status_number_available gauge
kube_daemonset_status_number_available{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_number_available{namespace="default",daemonset="datadog-agent"} 2
kube_daemonset_status_number_available{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_number_available{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_status_number_available{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_number_available{namespace="kube-system",daemonset="npd"} 2
# HELP kube_daemonset_status_number_misscheduled [STABLE] The number of nodes running a daemon pod but are not supposed to.
# TYPE kube_daemonset_status_number_misscheduled gauge
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 0
kube_daemonset_status_number_misscheduled{namespace="default",daemonset="datadog-agent"} 0
kube_daemonset_status_number_misscheduled{namespace="default",daemonset="telegraf"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="calico-node"} 0
kube_daemonset_status_number_misscheduled{namespace="kube-system",daemonset="npd"} 0
# HELP kube_daemonset_status_number_ready [STABLE] The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.
# TYPE kube_daemonset_status_number_ready gauge
kube_daemonset_status_number_ready{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_number_ready{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_status_number_ready{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_number_ready{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_number_ready{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_number_ready{namespace="default",daemonset="datadog-agent"} 2
# HELP kube_daemonset_status_number_unavailable [STABLE] The number of nodes that should be running the daemon pod and have none of the daemon pod running and available
# TYPE kube_daemonset_status_number_unavailable gauge
kube_daemonset_status_number_unavailable{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 0
kube_daemonset_status_number_unavailable{namespace="default",daemonset="datadog-agent"} 0
kube_daemonset_status_number_unavailable{namespace="default",daemonset="telegraf"} 0
kube_daemonset_status_number_unavailable{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 0
kube_daemonset_status_number_unavailable{namespace="kube-system",daemonset="calico-node"} 0
kube_daemonset_status_number_unavailable{namespace="kube-system",daemonset="npd"} 0
# HELP kube_daemonset_status_observed_generation [STABLE] The most recent generation observed by the daemon set controller.
# TYPE kube_daemonset_status_observed_generation gauge
kube_daemonset_status_observed_generation{namespace="kube-system",daemonset="npd"} 1
kube_daemonset_status_observed_generation{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1
kube_daemonset_status_observed_generation{namespace="default",daemonset="datadog-agent"} 1
kube_daemonset_status_observed_generation{namespace="default",daemonset="telegraf"} 18
kube_daemonset_status_observed_generation{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_status_observed_generation{namespace="kube-system",daemonset="calico-node"} 1
# HELP kube_daemonset_status_updated_number_scheduled [STABLE] The total number of nodes that are running updated daemon pod
# TYPE kube_daemonset_status_updated_number_scheduled gauge
kube_daemonset_status_updated_number_scheduled{namespace="default",daemonset="telegraf"} 2
kube_daemonset_status_updated_number_scheduled{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_status_updated_number_scheduled{namespace="kube-system",daemonset="calico-node"} 2
kube_daemonset_status_updated_number_scheduled{namespace="kube-system",daemonset="npd"} 2
kube_daemonset_status_updated_number_scheduled{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 2
kube_daemonset_status_updated_number_scheduled{namespace="default",daemonset="datadog-agent"} 2
# HELP kube_daemonset_metadata_generation [STABLE] Sequence number representing a specific generation of the desired state.
# TYPE kube_daemonset_metadata_generation gauge
kube_daemonset_metadata_generation{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_metadata_generation{namespace="kube-system",daemonset="calico-node"} 1
kube_daemonset_metadata_generation{namespace="kube-system",daemonset="npd"} 1
kube_daemonset_metadata_generation{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1
kube_daemonset_metadata_generation{namespace="default",daemonset="datadog-agent"} 1
kube_daemonset_metadata_generation{namespace="default",daemonset="telegraf"} 18
# HELP kube_daemonset_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_daemonset_annotations gauge
kube_daemonset_annotations{namespace="default",daemonset="datadog-agent"} 1
kube_daemonset_annotations{namespace="default",daemonset="telegraf"} 1
kube_daemonset_annotations{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_annotations{namespace="kube-system",daemonset="calico-node"} 1
kube_daemonset_annotations{namespace="kube-system",daemonset="npd"} 1
kube_daemonset_annotations{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1
# HELP kube_daemonset_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_daemonset_labels gauge
kube_daemonset_labels{namespace="kube-system",daemonset="openstack-cloud-controller-manager"} 1
kube_daemonset_labels{namespace="kube-system",daemonset="calico-node"} 1
kube_daemonset_labels{namespace="kube-system",daemonset="npd"} 1
kube_daemonset_labels{namespace="kube-system",daemonset="csi-cinder-nodeplugin"} 1
kube_daemonset_labels{namespace="default",daemonset="datadog-agent"} 1
kube_daemonset_labels{namespace="default",daemonset="telegraf"} 1

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Feb 28, 2023
@powersj
Copy link
Contributor

powersj commented Feb 28, 2023

That looks ok as well.

Do you get the same with v2.8.0 as well? I'm wondering if kubernetes/kube-state-metrics#1974 is what is causing your issue.

@powersj powersj added the waiting for response waiting for response from contributor label Feb 28, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 1, 2023

      - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
          protocol: TCP
        - containerPort: 8081
          name: telemetry
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}

I change the image to version 2.8.0 and it's has the same error

2023-03-01T03:25:30Z E! [inputs.prometheus] Error in plugin: error reading metrics for http://10.1.181.239:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data

According to this issue: kubernetes/kube-state-metrics#1974, It seem like we have problem when scrapt metric through http request instead of parsing metrics phase right?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 1, 2023
@powersj
Copy link
Contributor

powersj commented Mar 1, 2023

According to this issue: kubernetes/kube-state-metrics#1974, It seem like we have problem when scrapt metric through http request instead of parsing metrics phase right?

The reason I said during parsing is because the "error reading metrics for : ..." error happens right after we attempt to parse the metrics. If we had an issue making the HTTP request, the error is "error making HTTP request to", see here.

The next error in the chain "reading metric family protocol buffer failed" comes from the prometheus v1 parser where it tries to read protobuf data. This is checked via the media type and params of the HTTP headers and seems to return true in your case.

I assume you are using kubernetes_services and not urls because you need Telegraf to do some additional lookups to find the DNS records? Otherwise if you have directly access to the URL, have you tried urls?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 1, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 2, 2023

About your question about why I use kubernetes_services config
Yes, I think it's better to use kubernetes_services to scrape metrics because we just connect inside the cluster.
This is the configuration that I used:

    [[inputs.prometheus]]
      ## An array of urls to scrape metrics from.
      urls = ["http://kube-state-metrics.kube-system:8080/metrics"]

      ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
      ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
      ## Valid options: 1, 2
      # metric_version = 1

      ## Url tag name (tag containing scrapped url. optional, default is "url")
      # url_tag = "url"

      ## Whether the timestamp of the scraped metrics will be ignored.
      ## If set to true, the gather time will be used.
      # ignore_timestamp = false

      ## An array of Kubernetes services to scrape metrics from.
      ## kubernetes_services = ["http://kube-state-metrics.kube-system:8080/metrics"]

      ##[inputs.prometheus.tags]
      ##cluster = "${CLUSTER_NAME}"

      ## Kubernetes config file to create client from.
      # kube_config = "/path/to/kubernetes.config"

      ## Scrape Pods
      ## Enable scraping of k8s pods. Further settings as to which pods to scape
      ## are determiend by the 'method' option below. When enabled, the default is
      ## to use annotations to determine whether to scrape or not.
      # monitor_kubernetes_pods = false

      ## Scrape Pods Method
      ## annotations: default, looks for specific pod annotations documented below
      ## settings: only look for pods matching the settings provided, not
      ##   annotations
      ## settings+annotations: looks at pods that match annotations using the user
      ##   defined settings
      # monitor_kubernetes_pods_method = "annotations"

      ## Scrape Pods 'annotations' method options
      ## If set method is set to 'annotations' or 'settings+annotations', these
      ## annotation flags are looked for:
      ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
      ##     use 'prometheus.io/scrape=false' annotation to opt-out entirely.
      ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
      ##     need to set this to 'https' & most likely set the tls config
      ## - prometheus.io/path: If the metrics path is not /metrics, define it with
      ##     this annotation
      ## - prometheus.io/port: If port is not 9102 use this annotation

      ## Scrape Pods 'settings' method options
      ## When using 'settings' or 'settings+annotations', the default values for
      ## annotations can be modified using with the following options:
      # monitor_kubernetes_pods_scheme = "http"
      # monitor_kubernetes_pods_port = "9102"
      # monitor_kubernetes_pods_path = "/metrics"

      ## Get the list of pods to scrape with either the scope of
      ## - cluster: the kubernetes watch api (default, no need to specify)
      ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
      # pod_scrape_scope = "cluster"

      ## Only for node scrape scope: node IP of the node that telegraf is running on.
      ## Either this config or the environment variable NODE_IP must be set.
      # node_ip = "10.180.1.1"

      ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
      ## Default is 60 seconds.
      # pod_scrape_interval = 60

      ## Restricts Kubernetes monitoring to a single namespace
      ##   ex: monitor_kubernetes_pods_namespace = "default"
      # monitor_kubernetes_pods_namespace = ""
      ## The name of the label for the pod that is being scraped.
      ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
      # pod_namespace_label_name = "namespace"
      # label selector to target pods which have the label
      # kubernetes_label_selector = "env=dev,app=nginx"
      # field selector to target pods
      # eg. To scrape pods on a specific node
      # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"

      # cache refresh interval to set the interval for re-sync of pods list.
      # Default is 60 minutes.
      # cache_refresh_interval = 60

      ## Scrape Services available in Consul Catalog
      # [inputs.prometheus.consul]
      #   enabled = true
      #   agent = "http://localhost:8500"
      #   query_interval = "5m"

      #   [[inputs.prometheus.consul.query]]
      #     name = "a service name"
      #     tag = "a service tag"
      #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
      #     [inputs.prometheus.consul.query.tags]
      #       host = "{{.Node}}"

      ## Use bearer token for authorization. ('bearer_token' takes priority)
      # bearer_token = "/path/to/bearer/token"
      ## OR
      # bearer_token_string = "abc_123"

      ## HTTP Basic Authentication username and password. ('bearer_token' and
      ## 'bearer_token_string' take priority)
      # username = ""
      # password = ""

      ## Optional custom HTTP headers
      # http_headers = {"X-Special-Header" = "Special-Value"}

      ## Specify timeout duration for slower prometheus clients (default is 3s)
      # timeout = "3s"
      
      ##   deprecated in 1.26; use the timeout option
      # response_timeout = "3s"
      
      ## HTTP Proxy support
      # use_system_proxy = false
      # http_proxy_url = ""

      ## Optional TLS Config
      # tls_ca = /path/to/cafile
      # tls_cert = /path/to/certfile
      # tls_key = /path/to/keyfile

      ## Use TLS but skip chain & host verification
      # insecure_skip_verify = false

I tried to use urls config but it showed the same error:

2023-03-02T02:28:00Z E! [inputs.prometheus] Error in plugin: error reading metrics for http://kube-state-metrics.kube-system:8080/metrics: reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 2, 2023
@powersj
Copy link
Contributor

powersj commented Mar 2, 2023

I have put up #12779, which will print out some debug messages. I would like to see what the full response body and headers look like and see which protobuff message we are failing to parse.

In 20-30mins from this message that PR will have artifacts attached. Can you download one of them, run with it, and provide the full output please? Thanks!

I do still think this is related to the OpenMetrics being slightly different, in this case due to the protobuf not understanding it. Looking at the prometheus/client_model repo where we get the protobuf format from, there is an issue about updating the protobuf format for OpenMetrics.

@powersj powersj added the waiting for response waiting for response from contributor label Mar 2, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 3, 2023

Thanks for your detailed explanation!
I will try and give you debug message.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 3, 2023
@iMarvinS
Copy link

iMarvinS commented Mar 4, 2023

I'm also using [email protected] and I experience exactly the same issue.
I've ran the telegraf binary from the artifcats.

Here is the output:
image

@powersj
Copy link
Contributor

powersj commented Mar 6, 2023

Hi,

Thanks for that output.

Can either of you try setting the http header in your config, like:

http_headers = {"Accept" = "text/plain"}

It sounds like in kubernetes/kube-state-metrics#1974 that, that should revert to the old style metrics.

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Mar 6, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 7, 2023

It seems like, telegraf version >= 1.25 is supported the custom HTTP header in inputs.prometheus (#12364)
I am using version 1.23 so I can't do the test you advised.
How can I add a custom HTTP header with version 1.23?
This is an error when I add http_headers config:

2023-03-07T02:41:22Z E! [telegraf] Error running agent: Error loading config file /etc/telegraf/telegraf.conf: plugin inputs.prometheus: line 30: configuration specified the fields ["http_headers"], but they weren't used

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2023
@powersj
Copy link
Contributor

powersj commented Mar 7, 2023

How can I add a custom HTTP header with version 1.23?

Unfortunately, you cannot :) you would need to use the newer version with the feature. Any chance you could try this with the newer version somewhere?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 7, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 9, 2023

I do some customs with the Telegraf agent, so I must merge our customization to Telegraf version 1.25.
I will upgrade the Telegraf version and give you the output of the test.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 9, 2023
@powersj powersj added the waiting for response waiting for response from contributor label Mar 9, 2023
@SrDayne
Copy link

SrDayne commented Mar 14, 2023

http_headers = {"Accept" = "text/plain"}

Hello. I set this header in my telegraf config. Still have same error. My setup:

Telegraf version: 1.25.3
Prometheus plugin config:

[[inputs.prometheus]]
    urls = ["http://prometheus-prometheus-node-exporter.monitoring.svc.cluster.local:9100/metrics",
            "http://prometheus-kube-state-metrics.monitoring.svc.cluster.local:8080/metrics"]
    bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
    insecure_skip_verify = true
    metric_version = 2
    http_headers = {"Accept" = "text/plain"}

Metrics:

# HELP kube_pod_container_state_started [STABLE] Start time in unix timestamp for a pod container.
# TYPE kube_pod_container_state_started gauge
kube_pod_container_state_started{namespace="monitoring",pod="prometheus-kube-state-metrics-7c9d9988b8-b6t5d",uid="0efcf9c0-46fa-477f-95c8-a32cfcd3dee8",container="kube-state-metrics"} 1.678373522e+09
kube_pod_container_state_started{namespace="argocd",pod="argocd-notifications-controller-6cccb7cc45-687z2",uid="3b09231b-19f0-4abf-a265-0c632fe7b824",container="argocd-notifications-controller"} 1.67775691e+09
kube_pod_container_state_started{namespace="istio-system",pod="istio-ingressgateway-7954975d69-9r55r",uid="69a12325-9be8-4bcb-8f76-7204f5284f9d",container="istio-proxy"} 1.677491525e+09
kube_pod_container_state_started{namespace="kube-system",pod="aws-load-balancer-controller-78f6944bc6-jmgld",uid="cec67306-f308-4041-acb7-594985d18c42",container="controller"} 1.677597526e+09
kube_pod_container_state_started{namespace="kube-system",pod="aws-node-7vw4g",uid="2e71e696-2123-4c4d-948e-1658e0e1f548",container="aws-node"} 1.671792879e+09

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 14, 2023
@powersj
Copy link
Contributor

powersj commented Mar 14, 2023

I believe we now need to wait on this upstream issue: kubernetes/kube-state-metrics#2022

Even with setting the accept to text/plain, it is returning a content-type of:

Content-Type:[application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily;

Which will attempt to parse the data as protobuf data, when it is not.

In summary:

  1. kube-state-metrics started sending results in OpenMetrics format. The prometheus/client_model repo has an issue to update the protobuf spec to support this: Histograms: Port the protobuf changes to the OpenMetrics protobuf spec prometheus/client_model#60 but until that is resolved Telegraf will not be able to parse the OpenMetrics.

  2. The "workaround" to force the previous behavior with kube-state-metrics is to include the http_headers = {"Accept" = "text/plain"}. However, that is broken until KSM 2.8.0 can't be scraped by Prometheus if Native Histograms is enabled kubernetes/kube-state-metrics#2022 is resolved and released to actually change the response headers.

Step to reproduce a K8s cluster with KSM:

kind create cluster --name ksm
kind export kubeconfig --name=ksm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install ksm prometheus-community/kube-state-metrics
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/instance=ksm" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 8080

I would prefer to close this as there is nothing in Telegraf for us to do currently. If the protobuf gets updated for OpenMetric support down the road, I would happily see an issue or even a PR to update the version of that dependency to support Openmetrics.

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Mar 14, 2023
@tantm3
Copy link
Author

tantm3 commented Mar 15, 2023

Let's pause the discussion here and await an update from kube-state-metrics!

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 15, 2023
@powersj powersj added the waiting for response waiting for response from contributor label Mar 15, 2023
@CatherineF-dev
Copy link

FYI: it's fixed in v2.8.2

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 17, 2023
@qistoph
Copy link

qistoph commented Mar 18, 2023

I seem to be having the same issue with metrics from Authentik.

telegraf    | 2023-03-18T22:07:30Z E! [inputs.prometheus] Error in plugin: error reading metrics for "http://server:9300/metrics": reading metric family protocol buffer failed: proto: cannot parse invalid wire-format data

The http_headers configuration option is not solving it, because the value is added instead of replacing the original value.

[[inputs.prometheus]]
  urls = ["http://server:9300/metrics"]
  name_prefix = "authentik_"
  http_headers = {"Accept" = "text/plain", "Accept-Too" = "test/asdf"}

Headers in the request:

Host: server:9300
User-Agent:.Telegraf/1.26.0.Go/1.20.2
Accept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3
Accept: text/plain
Accept-Too: test/asdf
Accept-Encoding:.gzip.

Tcpdump:

22:07:30.031580 IP 192.168.0.3.43244 > 192.168.0.2.9300: Flags [P.], seq 1:287, ack 1, win 502, options [nop,nop,TS val 751130760 ecr 2090594872], length 286
        0x0000:  4500 0152 82cf 4000 4006 3581 c0a8 0003  E..R..@[email protected].....
        0x0010:  c0a8 0002 a8ec 2454 013d 7207 4cae 67ca  ......$T.=r.L.g.
        0x0020:  8018 01f6 829a 0000 0101 080a 2cc5 5888  ............,.X.
        0x0030:  7c9b f238 4745 5420 2f6d 6574 7269 6373  |..8GET./metrics
        0x0040:  2048 5454 502f 312e 310d 0a48 6f73 743a  .HTTP/1.1..Host:
        0x0050:  2073 6572 7665 723a 3933 3030 0d0a 5573  .server:9300..Us
        0x0060:  6572 2d41 6765 6e74 3a20 5465 6c65 6772  er-Agent:.Telegr
        0x0070:  6166 2f31 2e32 362e 3020 476f 2f31 2e32  af/1.26.0.Go/1.2
        0x0080:  302e 320d 0a41 6363 6570 743a 2061 7070  0.2..Accept:.app
        0x0090:  6c69 6361 7469 6f6e 2f76 6e64 2e67 6f6f  lication/vnd.goo
        0x00a0:  676c 652e 7072 6f74 6f62 7566 3b70 726f  gle.protobuf;pro
        0x00b0:  746f 3d69 6f2e 7072 6f6d 6574 6865 7573  to=io.prometheus
        0x00c0:  2e63 6c69 656e 742e 4d65 7472 6963 4661  .client.MetricFa
        0x00d0:  6d69 6c79 3b65 6e63 6f64 696e 673d 6465  mily;encoding=de
        0x00e0:  6c69 6d69 7465 643b 713d 302e 372c 7465  limited;q=0.7,te
        0x00f0:  7874 2f70 6c61 696e 3b76 6572 7369 6f6e  xt/plain;version
        0x0100:  3d30 2e30 2e34 3b71 3d30 2e33 0d0a 4163  =0.0.4;q=0.3..Ac
        0x0110:  6365 7074 3a20 7465 7874 2f70 6c61 696e  cept:.text/plain
        0x0120:  0d0a 4163 6365 7074 2d54 6f6f 3a20 7465  ..Accept-Too:.te
        0x0130:  7374 2f61 7364 660d 0a41 6363 6570 742d  st/asdf..Accept-
        0x0140:  456e 636f 6469 6e67 3a20 677a 6970 0d0a  Encoding:.gzip..
        0x0150:  0d0a                                     ..
22:07:30.031643 IP 192.168.0.2.9300 > 192.168.0.3.43244: Flags [.], ack 287, win 507, options [nop,nop,TS val 2090594876 ecr 751130760], length 0
        0x0000:  4500 0034 c0a6 4000 4006 f8c7 c0a8 0002  E..4..@.@.......
        0x0010:  c0a8 0003 2454 a8ec 4cae 67ca 013d 7325  ....$T..L.g..=s%
        0x0020:  8010 01fb 817c 0000 0101 080a 7c9b f23c  .....|......|..<
        0x0030:  2cc5 5888                                ,.X.
22:07:30.032980 IP 192.168.0.2.9300 > 192.168.0.3.43244: Flags [P.], seq 1:4097, ack 287, win 507, options [nop,nop,TS val 2090594878 ecr 751130760], length 4096
        0x0000:  4500 1034 c0a7 4000 4006 e8c6 c0a8 0002  E..4..@.@.......
        0x0010:  c0a8 0003 2454 a8ec 4cae 67ca 013d 7325  ....$T..L.g..=s%
        0x0020:  8018 01fb 917c 0000 0101 080a 7c9b f23e  .....|......|..>
        0x0030:  2cc5 5888 4854 5450 2f31 2e31 2032 3030  ,.X.HTTP/1.1.200
        0x0040:  204f 4b0d 0a43 6f6e 7465 6e74 2d54 7970  .OK..Content-Typ
        0x0050:  653a 2061 7070 6c69 6361 7469 6f6e 2f76  e:.application/v
        0x0060:  6e64 2e67 6f6f 676c 652e 7072 6f74 6f62  nd.google.protob
        0x0070:  7566 3b20 7072 6f74 6f3d 696f 2e70 726f  uf;.proto=io.pro
        0x0080:  6d65 7468 6575 732e 636c 6965 6e74 2e4d  metheus.client.M
        0x0090:  6574 7269 6346 616d 696c 793b 2065 6e63  etricFamily;.enc
        0x00a0:  6f64 696e 673d 6465 6c69 6d69 7465 640d  oding=delimited.
        0x00b0:  0a44 6174 653a 2053 6174 2c20 3138 204d  .Date:.Sat,.18.M
        0x00c0:  6172 2032 3032 3320 3232 3a30 373a 3330  ar.2023.22:07:30
        0x00d0:  2047 4d54 0d0a 5472 616e 7366 6572 2d45  .GMT..Transfer-E
        0x00e0:  6e63 6f64 696e 673a 2063 6875 6e6b 6564  ncoding:.chunked
        0x00f0:  0d0a 0d0a 3830 300d 0af3 010a 1761 7574  ....800......aut
        0x0100:  6865 6e74 696b 5f6d 6169 6e5f 7265 7175  hentik_main_requ
        0x0110:  6573 7473 1228 5468 6520 746f 7461 6c20  ests.(The.total.
        0x0120:  6e75 6d62 6572 206f 6620 636f 6e66 6967  number.of.config
        0x0130:  7572 6564 2070 726f 7669 6465 7273 1804  ured.providers..
        0x0140:  22ab 010a 0c0a 0464 6573 7412 0463 6f72  "......dest..cor
        0x0150:  653a 9a01 0861 1100 0000 007c 7921 411a  e:...a.....|y!A.
        0x0160:  0b08 0011 7b14 ae47 e17a 743f 1a0b 0800  ....{..G.zt?....
        0x0170:  117b 14ae 47e1 7a84 3f1a 0b08 0011 9a99  .{..G.z.?.......
        0x0180:  9999 9999 993f 1a0b 0800 119a 9999 9999  .....?..........
        0x0190:  99a9 3f1a 0b08 0011 9a99 9999 9999 b93f  ..?............?
        0x01a0:  1a0b 0800 1100 0000 0000 00d0 3f1a 0b08  ............?...
        0x01b0:  0011 0000 0000 0000 e03f 1a0b 0800 1100  .........?......
        0x01c0:  0000 0000 00f0 3f1a 0b08 0011 0000 0000  ......?.........
        0x01d0:  0000 0440 1a0b 0800 1100 0000 0000 0014  ...@............
       **SNIP***

@tantm3
Copy link
Author

tantm3 commented Mar 19, 2023

FYI: it's fixed in v2.8.2

Thanks for your information.
I upgraded kube-state-metrics to version 2.8.2 and it works without changing anything.

I think we can close this conversation now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

6 participants