Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul_autopilot_healthy is reporting zero on the leader #11152

Closed
arielazem opened this issue Sep 27, 2021 · 3 comments
Closed

consul_autopilot_healthy is reporting zero on the leader #11152

arielazem opened this issue Sep 27, 2021 · 3 comments
Labels
theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected

Comments

@arielazem
Copy link

Overview of the Issue
With Consul v1.10.1 (Revision db839f1), the metric consul_autopilot_healthy is zero for all the servers including the leader, even when autopilot reports the cluster is healthy

Screen Shot 2021-09-27 at 11 36 42 AM

Zero on all the servers

Screen Shot 2021-09-27 at 11 43 00 AM

The autopilot health endpoint reports the cluster is healthy.

Screen Shot 2021-09-27 at 11 38 44 AM

Reproduction Steps
Steps to reproduce this issue, eg:

Run Consul v1.10.1
Go to /v1/agent/metrics?format=prometheus
Operating system and Environment details
Linux VM in GCP

@mikemorris
Copy link
Contributor

mikemorris commented Sep 27, 2021

Dupe of #10730, or is this distinct?

@arielazem arielazem changed the title consul_autopilot_healthy is zero on all the nodes consul_autopilot_healthy is reporting zero on the leader Sep 27, 2021
@arielazem
Copy link
Author

arielazem commented Sep 27, 2021

@mikemorris , #10730 is about all the followers reporting 0, while this other ticket #11152 is about leader reporting 0 as well, although things might be related, it's a different issue

@jkirschner-hashicorp jkirschner-hashicorp added theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected labels Sep 27, 2021
@acpana
Copy link
Contributor

acpana commented Sep 29, 2021

hey @arielazem thanks for opening this issue with us!

I actually do believe that the underlying cause is a change in one of our dependencies (go-metrics). The behavior gets manifested as metrics that were previously NaN (whether in a certain stage of leadership or not) showing up with a 0 value.

As the team works on a fix, we'll track progress on #10730 for now so I will close this issue as a duplicate. But please feel free to re-open as needed.


For anyone that's coming after, I added some more descriptive steps on how to repro the problem with the metrics here:

#10730 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/telemetry Anything related to telemetry or observability type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants