Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS collector fails with message "failed collecting os metrics:<nil> CreateFile D:\pagefile.sys: The system cannot find the file specified." #1069

Closed
chavalikesavanand opened this issue Sep 27, 2022 · 13 comments · Fixed by #1108

Comments

@chavalikesavanand
Copy link

chavalikesavanand commented Sep 27, 2022

We are using 0.19.0 version with Prometheus 2.34.0.
windows_exporter_collector_success shows 0 for os collector. All other collectors succeeded.

we checked that pagefile is indeed enabled only on d: drive with "system managed" size. It is disabled on c: drive

we already tried executing lodctr /R many times and restarting windows exporter windows service many times.
Windows exporter windows service runs in local system account.

Windows exporter config:

collectors:
enabled: cpu,cpu_info,cs,iis,logical_disk,logon,memory,net,os,process,service,system,time

image

@breed808
Copy link
Contributor

breed808 commented Oct 3, 2022

Strange, this should have been resolved in #954. Are you seeing any relevant output from the exporter for the os collector?

@chavalikesavanand
Copy link
Author

chavalikesavanand commented Oct 3, 2022

thanks @breed808 for quick turn around. Appreciate it.
Basically, it logs this error in windows event logs (we configured windows event logs to be log destination for exporter) and quits OS collector. Is it possible to continue with other OS metrics? Also i think it is better to control this error log, may be log only once or once in some number of scrapes etc? It is cluttering our event logs...

@breed808
Copy link
Contributor

breed808 commented Oct 5, 2022

I've tried to reproduce this on a spare Windows 10 system but have failed. The exporter continues without failure, with paging on D:\ set to "system managed" and paging on C:\ disabled.

Screenshot_20221005_201206

The only meaningful difference I can see is the addition of the PagingFiles key, but the absence of this shouldn't cause the os collector to fail.

@ReneZeidler
Copy link

ReneZeidler commented Dec 9, 2022

I'm running into the same/related issue.

The OS collector fails with the error: failed collecting os metrics:<nil> CreateFile B:\pagefile.sys: The system cannot find the file specified.

The registry value for HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\ExistingPageFiles is:

\??\B:\pagefile.sys
\??\C:\pagefile.sys

The system is indeed configured to create a page file on B: with system managed size:
image

However, the error message is correct, and the page file doesn't actually exist on B:. The page file on C: exists with the correct size. Windows itself doesn't seem to care that the file is missing, it just doesn't show up in the Performance Monitor:
image
It reports the total page file size as 400 MB, which is just the size of the C: pagefile.

I don't know why the file doesn't exist, I couldn't find any warnings or errors in the event viewer. The disk is writable but completely empty. Maybe Windows just decided that 0 MB is the correct size and just doesn't create the file, who knows.

This whole situation seems a bit nonsensical and misconfigured, but it is apparently a configuration that Windows allows, and it is the one I have to deal with for now.


My issue is that this missing file causes the whole OS collector to fail.

I think an appropriate solution would be to just skip a page file if it is missing and don't add anything to the total, instead of returning an error here:
https://github.com/prometheus-community/windows_exporter/blob/master/collector/os.go#L205-L208


EDIT:
I could trace this back to a regression introduced by #702. Downgrading to 0.16.0, which uses the old WMI method to get page file information, doesn't error and correctly reports the page file size.

@breed808
Copy link
Contributor

Great work investigating that 👍. Float values in golang initialise at 0 so omitting the error check is fine here.

@ReneZeidler could I get you to test the feature branch in #1108 before I merge it?

@ReneZeidler
Copy link

@breed808

Thanks for the fix. Unfortunately, the exporter now crashes:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x38 pc=0x14d239b]

goroutine 40 [running]:
github.com/prometheus-community/windows_exporter/collector.(*OSCollector).collect(0xc0000faa80, 0xc0000c65b8, 0x0?)
        /home/r718037/promu/windows_exporter/collector/os.go:209 +0x163b
github.com/prometheus-community/windows_exporter/collector.(*OSCollector).Collect(0x0?, 0x0?, 0x0?)
        /home/r718037/promu/windows_exporter/collector/os.go:136 +0x1d
main.execute({0x174ff2a, 0x2}, {0x185cfe0, 0xc0000faa80}, 0x0?, 0x0?)
        /home/r718037/promu/windows_exporter/exporter.go:200 +0x84
main.windowsCollector.Collect.func2({0x174ff2a, 0x2}, {0x185cfe0?, 0xc0000faa80?})
        /home/r718037/promu/windows_exporter/exporter.go:141 +0x9b
created by main.windowsCollector.Collect
        /home/r718037/promu/windows_exporter/exporter.go:139 +0x5c5

The line

fsipf += float64(file.Size())

needs to be wrapped in an else block. If I do that, everything works as expected and the OS exporter runs without error.

@chavalikesavanand
Copy link
Author

When this fix will be released?

@breed808
Copy link
Contributor

breed808 commented Jan 5, 2023

Release candidate for v0.21.0 has been made available. v0.21.0 release will be available in about a week if no blocking issues are raised.

@chavalikesavanand
Copy link
Author

Any update on the release please? we are waiting for this fix

@chavalikesavanand
Copy link
Author

@breed808 any updates on the release please?

@breed808
Copy link
Contributor

breed808 commented Feb 4, 2023

@chavalikesavanand did the v0.21.0 release candidate resolve your issue?

I'm hoping to have the official v0.21.0 released after merging #1133; unfortunately life has got in the way and delayed me a few weeks.

@chavalikesavanand
Copy link
Author

@breed808 unfortunately this is reproducible only on our prod servers. we cannot test there without a formal release here. Hence looking out for .21 official release eagerly

@chavalikesavanand
Copy link
Author

@breed808 any update on release .21 please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants