Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The node-exporter stop with nothing message #1090

Closed
wupj1993 opened this issue Sep 29, 2018 · 18 comments
Closed

The node-exporter stop with nothing message #1090

wupj1993 opened this issue Sep 29, 2018 · 18 comments

Comments

@wupj1993
Copy link

Host operating system: output of uname -a

Linux k8s-node-1 4.18.10-1.el7.elrepo.x86_64 #1 SMP Wed Sep 26 16:20:39 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

build user: root@a67a9bc13a69
build date: 20180515-15:52:42
go version: go1.9.6

node_exporter command line flags

Are you running node_exporter in Docker?

no

What did you do that produced an error?

:9100/metrics cann't work and the process is gone

What did you expect to see?

I want to know where can I see the log or something about that why the node-exporter cann't work.
I use the command(./node-exporter --log.level="debug" >node_exporter.log 2>&1 &) to start my node-exporter,but I cann't find anything about it.
when I look up somthing about it from /var/log/message,I can not find anything message about 'killed process' which said about node-exporter

What did you see instead?

The process stoped
Thanks

@jorinvo
Copy link

jorinvo commented Oct 1, 2018

We are having the same issue. In our case we use systemd + journalctl and we can't see anything suspicious in the logs. Happened more than once, but only on one server, no specific time of the day, last time was four days ago. I suspect it is related to some other event occurring on the machine, but I cannot explain why one process would stop without notice while everything else keeps running.

@pgier
Copy link
Contributor

pgier commented Oct 2, 2018

@BigDuck based on your command you should see logs in node_exporter.log?
@jorinvo Can you try running with debug logging enabled (--log.level="debug")?

@jorinvo
Copy link

jorinvo commented Oct 3, 2018

@pgier thanks, will try. Not sure when will be the next time it dies though.

@discordianfish
Copy link
Member

This is a really surprising. Do you know with which exit code it exited?
Also check dmesg / /var/log/kern.log, I'd suspect it got killed by something.

@wupj1993
Copy link
Author

@BigDuck based on your command you should see logs in node_exporter.log?
@jorinvo Can you try running with debug logging enabled (--log.level="debug")?

Thank you for your reply. It's too busy in the past few day , so I can't find your reply. I can not find anything message in my node-exporter . But now it's working well,maybe it’s because our operations person add memory chips. but I still remember the server getting enough membery. at least morethan 1GB is free. anyway it is working well now .if I can find something I will tell you .Thank again

@discordianfish
Copy link
Member

Possibly a dup of #1008

@youjia0721
Copy link

youjia0721 commented Oct 11, 2018

I have the same issue too...and I don't think it's because of memory usage...
maybe other reasons?
It happend twice in one day on two virtual-servers, and seems happend at same time
image

image
image

@discordianfish
Copy link
Member

@youjia0721 Do you know the exit code of the process? Maybe there is more in the log?
Beside that, can you also check dmesg or /var/log/kern.log for possible other reasons that might caused it to get killed?
Also, running the node-exporter with --log.level="debug") might get us more details.

@jorinvo
Copy link

jorinvo commented Oct 11, 2018

For us, we ran it with the debug flag, it crashed again but no suspicious logs in any of the places.
We now updated from 0.15.2 to 0.16.0 and wait for the problem happening again (or not!).
Also, the server we run this on is a VM and it runs Gitlab (we are not using Gitlab's node_exporter though).
Our CPU and memory didn't have any spikes and had enough buffer left.

@phyber
Copy link
Contributor

phyber commented Oct 12, 2018

This had been happening to me for a while too. The latest being last night, so here's some output.
The exporter was not killed by the Linux OOM killer and dmesg contains no interesting lines regarding the exporter.

Other useful information: In my case, Prometheus isn't contacting this particular Node Exporter directly, rather, it is being proxied by nginx. Nginx is terminating some TLS and providing basic auth for the exporter. I don't believe this has anything to do with the error, but thought it was worth mentioning.

Version output

# prometheus-node-exporter --version
node_exporter, version 0.16.0+ds (branch: debian/sid, revision: 0.16.0+ds-1)
  build user:       [email protected]
  build date:       20180613-19:00:23
  go version:       go1.10.2

daemon.log

# grep node-exporter ../daemon.log
Oct 12 01:06:44 hostname prometheus-node-exporter: prometheus-node-exporter: client (pid 18148) exited with 2 status

Node Exporter log

fatal error: systemstack called from unexpected goroutineruntime.throw(0x9df740, 0x17)                                                                                                             [52/5342]
        /usr/lib/go-1.10/src/runtime/panic.go:616 +0x81
runtime.schedule()
        /usr/lib/go-1.10/src/runtime/proc.go:2489 +0x351
runtime.mstart1(0xc400000000)
        /usr/lib/go-1.10/src/runtime/proc.go:1237 +0x9e
runtime.mstart()
        /usr/lib/go-1.10/src/runtime/proc.go:1193 +0x76

goroutine 1 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f63b5b3cf00, 0x72, 0x0)
        /usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420142898, 0x72, 0xc4204b0700, 0x0, 0x0)
        /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420142898, 0xffffffffffffff00, 0x0, 0x0)
        /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc420142880, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/internal/poll/fd_unix.go:372 +0x1a8
net.(*netFD).accept(0xc420142880, 0xc42008e120, 0xc42004bbe0, 0x402dc8)
        /usr/lib/go-1.10/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc420134670, 0xc42004bc10, 0x401d27, 0xc42008e120)
        /usr/lib/go-1.10/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).AcceptTCP(0xc420134670, 0xc42004bc58, 0xc42004bc60, 0x18)
        /usr/lib/go-1.10/src/net/tcpsock.go:246 +0x49
net/http.tcpKeepAliveListener.Accept(0xc420134670, 0x9ff8f8, 0xc42008e0a0, 0xa49040, 0xc4201b1680)
        /usr/lib/go-1.10/src/net/http/server.go:3216 +0x2f
net/http.(*Server).Serve(0xc4201dc750, 0xa48d00, 0xc420134670, 0x0, 0x0)
        /usr/lib/go-1.10/src/net/http/server.go:2770 +0x1a5
net/http.(*Server).ListenAndServe(0xc4201dc750, 0xc4201dc750, 0x2)
        /usr/lib/go-1.10/src/net/http/server.go:2711 +0xa9
net/http.ListenAndServe(0x7ffdb9891ed1, 0xe, 0x0, 0x0, 0x1, 0xc4201da200)
        /usr/lib/go-1.10/src/net/http/server.go:2969 +0x7a
main.main()
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/node_exporter/node_exporter.go:112 +0x9cf

goroutine 1184119 [runnable]:
fmt.(*pp).doPrintf(0xc4204760c0, 0x9d5ac1, 0x9, 0xc420311468, 0x3, 0x3)
        /usr/lib/go-1.10/src/fmt/print.go:951 +0x11c4
fmt.Fprintf(0xa42c80, 0xc4200ba9a0, 0x9d5ac1, 0x9, 0xc420311468, 0x3, 0x3, 0xc4200a6100, 0x1d, 0x100)
        /usr/lib/go-1.10/src/fmt/print.go:188 +0x72
github.com/prometheus/common/expfmt.labelPairsToText(0xc4201347c0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0xa42c80, 0xc4200ba9a0, 0x0, ...)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:261 +0x228
github.com/prometheus/common/expfmt.writeSample(0xc4200245e0, 0x1d, 0xc420333800, 0x0, 0x0, 0x0, 0x0, 0x3ff0000000000000, 0xa42c80, 0xc4200ba9a0, ...)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:214 +0x15f
github.com/prometheus/common/expfmt.MetricFamilyToText(0xa42c80, 0xc4200ba9a0, 0xc420218690, 0x8c3, 0x0, 0x0)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/text_create.go:88 +0x482
github.com/prometheus/common/expfmt.NewEncoder.func4(0xc420218690, 0x0, 0x0)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/encode.go:83 +0x3d
github.com/prometheus/common/expfmt.encoder.Encode(0xc42027b7a0, 0xc420218690, 0x0, 0x0)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/common/expfmt/encode.go:36 +0x30
github.com/prometheus/client_golang/prometheus/promhttp.HandlerFor.func1(0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/http.go:142 +0x2ee
net/http.HandlerFunc.ServeHTTP(0xc4201f4500, 0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1(0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:40 +0xa9
net/http.HandlerFunc.ServeHTTP(0xc420226c00, 0x7f63b5b3d0e0, 0xc4201f4640, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(0xa48900, 0xc420302000, 0xc42012c300)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:100 +0xda
net/http.HandlerFunc.ServeHTTP(0xc420226cc0, 0xa48900, 0xc420302000, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
main.handler(0xa48900, 0xc420302000, 0xc42012c300)
        /build/prometheus-node-exporter-85d2jU/prometheus-node-exporter-0.16.0+ds/build/src/github.com/prometheus/node_exporter/node_exporter.go:68 +0x718
net/http.HandlerFunc.ServeHTTP(0x9ff728, 0xa48900, 0xc420302000, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:1947 +0x44
net/http.(*ServeMux).ServeHTTP(0xef6f40, 0xa48900, 0xc420302000, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:2337 +0x130
net/http.serverHandler.ServeHTTP(0xc4201dc750, 0xa48900, 0xc420302000, 0xc42012c300)
        /usr/lib/go-1.10/src/net/http/server.go:2694 +0xbc
net/http.(*conn).serve(0xc42008e0a0, 0xa48f80, 0xc4204b0000)
        /usr/lib/go-1.10/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
        /usr/lib/go-1.10/src/net/http/server.go:2795 +0x27b

goroutine 1184120 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7f63b5b3ce30, 0x72, 0xc42028ae58)
        /usr/lib/go-1.10/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420142098, 0x72, 0xffffffffffffff00, 0xa44d00, 0xec7858)
        /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc420142098, 0xc420226100, 0x1, 0x1)
        /usr/lib/go-1.10/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420142080, 0xc420226101, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/internal/poll/fd_unix.go:157 +0x17d
net.(*netFD).Read(0xc420142080, 0xc420226101, 0x1, 0x1, 0xc42046b2c0, 0x0, 0x0)
        /usr/lib/go-1.10/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc420134030, 0xc420226101, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/net/net.go:176 +0x6a
net/http.(*connReader).backgroundRead(0xc4202260f0)
        /usr/lib/go-1.10/src/net/http/server.go:668 +0x5a
created by net/http.(*connReader).startBackgroundRead
        /usr/lib/go-1.10/src/net/http/server.go:664 +0xce

@SuperQ
Copy link
Member

SuperQ commented Oct 12, 2018

Interesting crash problem. Have you tried with an official binary, rather than the Debian one? The Debian build uses different vendored code to our official releases, which may introduce bugs.

@phyber
Copy link
Contributor

phyber commented Oct 12, 2018

I have not tried official binaries, I could. However, it takes a random period of time for the issue to trigger. Between a week and a month seems to be the usual.

Different vendored code

😬 sounds nasty. If it's believed that my issue is different to BigDuck's, I can delete my responses to avoid polluting this issue.

@youjia0721
Copy link

@youjia0721 Do you know the exit code of the process? Maybe there is more in the log?
Beside that, can you also check dmesg or /var/log/kern.log for possible other reasons that might caused it to get killed?
Also, running the node-exporter with --log.level="debug") might get us more details.

@discordianfish Thanks for your response. I have been waiting for a "node-exporter stopped" issue in these days, but it didn't happen again. Now I'm trying to run exporter with "--log.level=debug", and if I find anyting new I will post it here...

@mambalex
Copy link

mambalex commented Jul 1, 2020

level=warn ts=2020-07-01T05:22:40.810Z caller=cpu_linux.go:255 collector=cpu msg="CPU User counter jumped backwards" cpu=0 old_value=80126.3 new_value=80126.29

level=warn ts=2020-07-01T05:22:40.810Z caller=cpu_linux.go:255 collector=cpu msg="CPU User counter jumped backwards" cpu=1 old_value=80488.51 new_value=80488.5

I got these warnings before the node-exporter crashed.

@discordianfish
Copy link
Member

@mambalex Does anything else gets logged?

@srikantheee

This comment has been minimized.

@SuperQ
Copy link
Member

SuperQ commented Mar 5, 2021

Please stop posting off-topic messages to issues.

This issue is very old, and for obsolete versions. No conclusive evidence that there was a bug in the node_exporter.

I am going to close this out. If there is new evidence for this, please open a new issue.

@SuperQ SuperQ closed this as completed Mar 5, 2021
@prometheus prometheus locked as resolved and limited conversation to collaborators Mar 5, 2021
@SuperQ
Copy link
Member

SuperQ commented Mar 5, 2021

A reminder:

For questions/help/support please use our community channels. There are more people available to potentially respond to your request and the whole community can benefit from the answers provided.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants