Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana Prometheus streams dashboard shows incorrect values when using multiple server instances #5357

Open
klopfdreh opened this issue May 31, 2023 · 6 comments
Assignees
Labels
status/need-triage Team needs to triage and take a first look

Comments

@klopfdreh
Copy link
Contributor

klopfdreh commented May 31, 2023

Description:
Currently there is an issue with the Prometheus metrics of SCDF server. For example http_server_requests_seconds_max of any path is showing the value 0.0 even if I navigate through the UI.

Release versions:
2.10.3

Custom apps:

Steps to reproduce:
Setup spring-cloud-dataflow-server with prometheus-rsocket-proxy and see /metrics/connected endpoint.

Screenshots:
image

Note: We created our own artifact. based on https://github.com/spring-cloud/spring-cloud-dataflow/tree/v2.10.3/spring-cloud-dataflow-server That is the reason why there is a 1.0.63 mentioned.

Additional context:
The metrics are provided, but the count somehow is not working.

This is a Spring Boot standard metric, so I guess there is something broken in 2.7.x

@github-actions github-actions bot added the status/need-triage Team needs to triage and take a first look label May 31, 2023
@klopfdreh klopfdreh changed the title Spring Cloud Data Flow Server Prometheus metrics are not showing the correct values Spring Cloud Data Flow Server with Prometheus-RSocket-Proxy metrics are not showing the correct values May 31, 2023
@klopfdreh
Copy link
Contributor Author

I found the issue - it is when you scale up the instances in kubernetes to 2 and both servers are exporting the metrics at the same name

management:
  metrics:
    tags:
      application: myservername

@klopfdreh
Copy link
Contributor Author

I got the dasbhoard from here: https://grafana.com/grafana/dashboards/9933-streams/ and this might be changed so that the application is check that it starts with a pattern so that you can name the application with myservername-1 and myservername-2 or myservername-randomidentifier

@klopfdreh
Copy link
Contributor Author

The dashboard should be adjusted so that it use =~ in the metrics.

Variable Value: SERVER_APPLICATION_NAME=myservername.* (the .* is important to match all pods)
Env-Variable: MY_POD_NAME = myservername-3h35f2t3d-rcg8d

Example:

"expr": "process_uptime_seconds{application=~\"${SERVER_APPLICATION_NAME}\"}",

application.yml

management:
  metrics:
    tags:
      application: ${MY_POD_NAME}

SCDF deployment env-variables:

            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name

@klopfdreh
Copy link
Contributor Author

Hope this helps for a kubernetes setup with more than 1 replica. 👍

@klopfdreh
Copy link
Contributor Author

Other than that you could create selection to choose between the servers in the dashboard.

@cppwfs cppwfs removed the status/need-triage Team needs to triage and take a first look label Jun 1, 2023
@cppwfs cppwfs added the status/need-triage Team needs to triage and take a first look label Jun 1, 2023
@onobc onobc changed the title Spring Cloud Data Flow Server with Prometheus-RSocket-Proxy metrics are not showing the correct values Grafana Prometheus streams dashboard shows incorrect values when using multiple replicas Jun 5, 2023
@onobc onobc changed the title Grafana Prometheus streams dashboard shows incorrect values when using multiple replicas Grafana Prometheus streams dashboard shows incorrect values when using multiple server instances Jun 5, 2023
@onobc
Copy link
Contributor

onobc commented Jun 5, 2023

We could implement @klopfdreh suggested fix (or something similar) in:

  1. Dashboard(s) we provide in SCDF repo
  2. Dashboard(s) in Grafana labs (https://grafana.com/grafana/dashboards/9933-streams/)

I am not sure what is involved in 2nd item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/need-triage Team needs to triage and take a first look
Projects
None yet
Development

No branches or pull requests

4 participants