Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.25.1 occasional startup issues #1398

Closed
JDA88 opened this issue Jan 29, 2024 · 8 comments
Closed

Version 0.25.1 occasional startup issues #1398

JDA88 opened this issue Jan 29, 2024 · 8 comments
Labels

Comments

@JDA88
Copy link
Contributor

JDA88 commented Jan 29, 2024

Sometime when the windows_exporter start we get this error (event log):
ts=2024-01-28T03:43:29.147Z caller=stdlib.go:105 level=error caller=http.go:144 msg="error gathering metrics: error collecting metric Desc{fqName: \"windows_exporter_collector_success\", help: \"windows_exporter: Whether the collector was successful.\", constLabels: {}, variableLabels: {collector}}: failed to prepare scrape: EOF"

The issue is that once the error has happened at startup, the service is up but is unable to recover; The service should either:

  1. Crash and exit (and then can be restarted)
  2. Be able to retry and recover

Additional informations

  • Restarting the service fixes the issue
  • It can append with various collector combination, and even with simple ones (CS, CPU)
  • Very hard to reproduce as it only appends randomly (on our 700+ server deployement we have 5-10 issues a week).
  • Hard to tell when it started occurring, but it was not present in v0.22
@safster123
Copy link

Just wanted to note that I'm also seeing similar.

We also have a large number of servers and for the most part it's fine but a handful of servers will show this error.

Restarting fixes it.

@DiniFarb
Copy link
Contributor

DiniFarb commented Apr 4, 2024

When I follow the error line to see where the EOF could have happend, I find myself in the perflib query func

func QueryPerformanceData(query string) ([]*PerfObject, error) {

This functions calls more than once the following binary reader:

func (p *perfObjectType) BinaryReadFrom(r io.Reader) error {
return binary.Read(r, bo, p)
}

this reader could return EOF if the given buffer is empty. So one thing what could happen is that for example the queryraw func returns an empty buffer here:
buffer, err := queryRawData(query)
if err != nil {
return nil, err
}
r := bytes.NewReader(buffer)
// Read global header
header := new(perfDataBlock)
err = header.BinaryReadFrom(r)
if err != nil {
return nil, err
}

which would then return the EOF on line 283 and end in the mentioned error: failed to prepare scrape: EOF"

Since it is not really reproducible and very hard to guess on what perflib call this could happen, my suggestion would be to add the query string to the error message in order to have more visibility and maybe a starting point for further debugging.

Maybe something like this:

if err != nil {
  return nil, fmt.Errorf("failed to read performance data block for %s with: %v", query, err)
}

@breed808 or @jkroepke do you think that would be worth adding? I am open to create PR :)

@billtzim
Copy link

Just to mention that i also stumbled upon this issue. Occurrence ratio ~3% (1 out of 33hosts). Although restarting the service did not fix the issue for me

@jkroepke
Copy link
Member

Are you able to test your fix?

@DiniFarb
Copy link
Contributor

DiniFarb commented May 2, 2024

@jkroepke did you address me? Cos I don't have a solution/fix for this, my approach would only add more visibility which could help to find the cause of this problem. But I just saw the PR #1459 and I think the effort should go more in this direction. Cos if I understand it correctly by success of this PR most of the perflib calls would go away anyways.

Copy link

github-actions bot commented Aug 1, 2024

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Aug 1, 2024
@billtzim
Copy link

billtzim commented Aug 1, 2024

Hello, regarding the problem it went away when i updated my hosts to windows 11

@github-actions github-actions bot removed the Stale label Aug 2, 2024
@jkroepke
Copy link
Member

jkroepke commented Sep 8, 2024

i'm going to close the issue. If it still appears. Please create a new one.

@jkroepke jkroepke closed this as completed Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants