Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

vishiy · 2021-01-15T20:44:06Z

Feature Request

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods'

Current behavior:

Currently when 'monitor_kubernetes_pods=true' , telegraf watches for pods with specific annotations to scrape metrics, as pods come & go (in all namespaces or in specified namespaces). This approach works for smaller clusters, its almost 100% not scaling in bigger clusters (more than 500+ pods), especially when running telegraf in A pod which does this scraping for all pods in the cluster. This is also a single point of scale failure/unreliability when scraping thru pod annotations using telegraf promethus input plugin.

Proposal:

To introduce an additional option (may be like 'local_mode' or something more intuitive), which when TRUE, will get ONLY pods that are running in that node. It will fetch podlist locally for that node from the node's kubelet (instead of watching them thru API server as it does today) and scrape the ones with the same annotations as it is today. This will require running Telegraf as daemonset (in every node) in the cluster, which will do pod scraping in each node locally, when enabled. By default, this will be backward compatible (meaning this new option will be turned OFF/false by default and users can turn ON as they see the need)

Desired behavior:

Pod annotation based scraping thru Telegraf, scale as k8s cluster scales.

Use case:

As Kubernetes starts to become defacto for running workloads, most production clusters are growing, and prometheus metric sources & metrics are widely available. To monitor them thru telegraf, we need Telegraf to have reliable way to scale & collect metrics as the cluster grows.

ssoroka · 2021-01-21T20:29:11Z

This issue looks good. Do you think instead of a boolean flag it should ask for a list of nodes to query?

Either way I think we'd be in support of this, please do feel free to write up the PR if that's what you are intending.

vishiy · 2021-01-22T07:21:32Z

List of nodes would not be appropriate, as pods keep moving nodes. We will submit a pr for this. Thanks.

gracewehner · 2021-01-28T22:56:48Z

PR is out: #8762. Thanks.

sjwang90 · 2021-03-10T22:38:53Z

Closed in #8762

vishiy added the feature request Requests for new plugin and for new features to existing plugins label Jan 15, 2021

gracewehner mentioned this issue Jan 27, 2021

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

Merged

2 tasks

sjwang90 removed the ready label Jan 29, 2021

sjwang90 closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

vishiy commented Jan 15, 2021 •

edited

Loading

ssoroka commented Jan 21, 2021

vishiy commented Jan 22, 2021

gracewehner commented Jan 28, 2021

sjwang90 commented Mar 10, 2021

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

Comments

vishiy commented Jan 15, 2021 • edited Loading

Feature Request

Current behavior:

Proposal:

Desired behavior:

Use case:

ssoroka commented Jan 21, 2021

vishiy commented Jan 22, 2021

gracewehner commented Jan 28, 2021

sjwang90 commented Mar 10, 2021

vishiy commented Jan 15, 2021 •

edited

Loading