Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus remote_write rate calculations is showing as zero #38458

Closed
gizas opened this issue Mar 20, 2024 · 3 comments
Closed

Prometheus remote_write rate calculations is showing as zero #38458

gizas opened this issue Mar 20, 2024 · 3 comments
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@gizas
Copy link
Contributor

gizas commented Mar 20, 2024

Investigate why Prometheus integration with remote_write configuration is showing prometheus.*.rate fields as 0 value. User reported that is seeing the prometheus.*.counter fields, but the rate field is not showing any value.
Information:

  • TSDS enabled
  • Integration of prometheus version: 1.13.1
  • Beats version: 8.11
@gizas gizas added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Mar 20, 2024
@gizas gizas self-assigned this Mar 20, 2024
@gizas
Copy link
Contributor Author

gizas commented Mar 20, 2024

Issue is easily reproducible when you enable remote_write with rate_counters and use_types on

Screenshot 2024-03-20 at 12 12 25 PM

@gizas
Copy link
Contributor Author

gizas commented Mar 21, 2024

I am currently troubleshooting with metricbeat + prometheus integration with remote_write.

tl;dr - Short Summary

We dont correctly initialise the internal counter cache where we keep the counter values between fetches in order to evaluate rates.

Long description

The code for rate calculation of counter types is based on this rateCounterFloat64 fucntion, which has a receiver of remoteWriteTypedGenerator

This receiver actually initialises a newCounterCache here based on the config.Period parameter.
As the comment suggests:

// use a counter cache with a timeout of 5x the period, as a safe value
// to make sure that all counters are available between fetches

In Prometheus+collector : We initialise the Period with default value 10s. And seems that is the reason that we dont have an issue with collector

In remote_write we dont provide such Period value in elastic-agent config

Testing Results

So with metricbeat I am using the following config:

- module: prometheus
  metricsets: ["remote_write"]
  host: "0.0.0.0"
  port: "9201"
  use_types: true
  rate_counters: true
  period: 10m

(see that I provide a period)

For the last 30min or so the rates are calculated successfully

Screenshot 2024-03-21 at 7 03 46 PM

@gizas
Copy link
Contributor Author

gizas commented Mar 22, 2024

For elastic-agent installations, seems that there is a combination of issues:

  1. We need to install Prometheus assets from kibana UI
    Verify below mapping:
    Screenshot 2024-03-22 at 1 41 38 PM

  2. Apply period config in remote_write
    (I am testing with Prometheus v.1.5.0 and Elastic Stack 8.14.0, just for reference)
    With Standalone Agent:

inputs:
    - name: prometheus
      type: prometheus/metrics
      use_output: default
      meta:
        package:
          name: prometheus
          version: 1.15.0
      data_stream:
        namespace: default
      streams:
        - data_stream:
            dataset: prometheus.remote_write
            type: metrics
          metricsets:
            - remote_write
          host: '0.0.0.0'
          port: 9201
          rate_counters: true
          use_types: true
          period: 1m

Next Steps:

  1. For covering Prometheus Remote Write in Managed Agents:
  1. For elastic-agents:
    a. We need to update documentation
    b. Check how/ where kibana updates the generated manifests for kibana and see that period is present

  2. We need also to open a PR in beats to apply period default value for remote write:

  1. We need to document https://github.com/elastic/beats/blob/main/x-pack/metricbeat/metricbeat.reference.yml#L1320 the period parameter for remote write:
  1. We need to check why we have those conflicts with default metrics mapping and how to solve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

2 participants