Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steady memory consumption increase of about 2MB/h #859

Closed
xguerin opened this issue Mar 19, 2018 · 24 comments
Closed

Steady memory consumption increase of about 2MB/h #859

xguerin opened this issue Mar 19, 2018 · 24 comments

Comments

@xguerin
Copy link

xguerin commented Mar 19, 2018

Host operating system: output of uname -a

Linux xxx.xxx.xxx.xxx.xxx 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 13 10:46:25 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

Image: quay.io/prometheus/node-exporter:v0.15.2

node_exporter command line flags

--web.listen-address=127.0.0.1:9101
--path.procfs=/host/proc
--path.sysfs=/host/sys

Are you running node_exporter in Docker?

Yes, through the kube-prometheus operator.

What did you do that produced an error?

Just let the node_monitor running.

What did you expect to see?

A memory consumption stabilizing around 50MB.

What did you see instead?

An increasing memory consumption of about 2MB/h

leak

@daimon99
Copy link

daimon99 commented Mar 20, 2018

I had encountered this problem too. Memory leak?

@xguerin
Copy link
Author

xguerin commented Mar 20, 2018

Here is the memory consumption after running overnight:
leak
The rate seems to be lower than initially reported, but there is definitely some memory leaking.

@SuperQ
Copy link
Member

SuperQ commented Mar 20, 2018

What do you get for the metric process_resident_memory_bytes? Does it match the container graph?

It would help to get a pprof memory allocation graph.

https://blog.golang.org/profiling-go-programs

@grobie
Copy link
Member

grobie commented Mar 20, 2018

Our memory consumption with version 0.15.3 is stable, but ranges from 7MB minimum to 60MB maximum across thousands of nodes. We disable the ipvs, xfs and zfs collectors.

@xguerin
Copy link
Author

xguerin commented Mar 20, 2018

Here is the graph for my 4 node-exporter instances over the past day:

increase

EDIT updated graph.

@SuperQ
Copy link
Member

SuperQ commented Mar 20, 2018

@xguerin Your screenshot cut off the Y axis values, so I have no idea what it means.

@xguerin
Copy link
Author

xguerin commented Mar 20, 2018

Apologies I did not realize it. I updated the post.

@SuperQ
Copy link
Member

SuperQ commented Mar 20, 2018

Those numbers seem reasonable, as @grobie said, the amount used by the exporter depends on which collectors you're using, and how much data they need to gather. There is probably some optimization we could do, but it doesn't seem like a clear memory leak. As you can see in the graph, Go GC is doing some work and reducing the process size from time to time.

pprof samples are what we would need to debug farther, see if there are any specific collectors using up/ leaking memory.

@csawyerYumaed
Copy link

/usr/sbin/node_exporter --version
node_exporter, version 0.15.0 (branch: HEAD, revision: 4d7aa57)
build user: root@maya
build date: 20171103-04:16:00
go version: go1.8

node-exporter

@SuperQ
Copy link
Member

SuperQ commented Mar 21, 2018

@csawyerYumaed We really need a pprof dump in order to figure out what's going on.

You can follow the same basic steps as this blog post but gather the data from the node_exporter rather than Prometheus itself.

@csawyerYumaed
Copy link

I run that go tool pprof command in the source directory of node_exporter?

@brian-brazil
Copy link
Contributor

You don't need the source to run it, so you can run it from anywhere.

@csawyerYumaed
Copy link

oh DUH. sorry I was asleep at the w heel, it takes the URL as the argument to know where to profile.. I get it. I'll work on it next time it happens (I've already killed that process as it's super annoying)

@csawyerYumaed
Copy link

I can email the raw SVG somewhere if you want, but github won't let me post it here, and imgur and friends don't accept them either apparently.

$ ps auxww | grep node
node-ex+ 30732 32.5 1.9 2046576 650832 ? Sl 07:16 160:25 /usr/sbin/node_exporter --collector.textfile.directory /var/lib/node_exporter/textfile_collector
$ /usr/sbin/node_exporter --version
node_exporter, version 0.15.0 (branch: HEAD, revision: 0eecaa9)
build user: root@maya
build date: 20171026-03:15:58
go version: go1.8

nodexporterheap

@csawyerYumaed
Copy link

I've upgraded to 0.15.2 and 0.15.1 on these nodes, to see if they also leak, if they do I'll post the same information for those versions, respectively.

@SuperQ
Copy link
Member

SuperQ commented Mar 21, 2018

It would also be useful to test the 0.16.0 release candidate.

@csawyerYumaed
Copy link

OK, I didn't see the RC release. I have a node that leaked before on 0.16.0 RC release as well. If it leaks will probably do so overnight, will check on them in ~ 16 hrs or so(from now) will report back.

@csawyerYumaed
Copy link

Here is the last 24 hrs, you can clearly see when the nodes got upgraded to newer versions. Seems like the memory leak is gone in versions => 0.15.1 (for me at least). Will report back if it happens again.

node_exportermemoryusage

@daimon99
Copy link

daimon99 commented Mar 23, 2018

Me too.
After 3/21, we do a restart every day to reduce the memory consumption.

node_exporter --collector.supervisord --collector.textfile.directory=/data/monitor/textfile --collector.filesystem.ignored-fs-types="^(nfs4)$"

image

@grobie
Copy link
Member

grobie commented Mar 23, 2018

@daimon99 Have you upgraded to the latest versoin?

@daenney
Copy link
Contributor

daenney commented Mar 29, 2018

I'm not seeing anything akin to a memory leak on 0.16-rc.0. In the case of my machine it builds up to about 18MB of memory usage and is completely steady after that. I haven't disabled any collector on 0.16, didn't seem to be any need for it.

@zerkms
Copy link

zerkms commented Jun 17, 2018

The same here. Below is the graph for the node_exporter v0.16.0 downloaded from the releases page (the graph covers approximately 2 days of June 16th and 17th).

screenshot from 2018-06-18 09-09-03

The corresponding pprof heap dump is here: https://www.dropbox.com/s/d48l0rdco8cl4t7/heap.svg?dl=0

The RSS for the node_exporer process constantly grows and as of this moment it's ~170MiB

The command line we run node_exporter with:

/usr/local/bin/node_exporter --no-collector.arp --no-collector.bcache --no-collector.bonding --no-collector.conntrack --no-collector.edac --no-collector.entropy --no-collector.filefd --no-collector.hwmon --no-collector.infiniband --no-collector.ipvs --no-collector.loadavg --no-collector.mdadm --no-collector.sockstat --no-collector.nfs --no-collector.stat --no-collector.time --no-collector.timex --no-collector.vmstat --no-collector.wifi --no-collector.xfs --no-collector.zfs --no-collector.cpu --collector.supervisord --collector.filesystem.ignored-fs-types=^fuse.lxcfs|tmpfs$ --collector.supervisord.url=http://localhost:9001/RPC2 --collector.textfile.directory=/var/lib/redacted/prometheus/textfile

And this memory usage pattern is not unique: there are few other instances of the node_exporter that behave exactly the same

It looks like it's caused by supervisord collector for the hosts, where supervisord is not installed or is not running. So every time it tries to collect it some memory leaks in the xmlrpc (?) client or in the code that uses it.

@SuperQ
Copy link
Member

SuperQ commented Jun 18, 2018

@zerkms Interesting, thanks for the quality bug report. Yes, it looks like the xmlrpc library is leaky, or the supervisord collector doesn't close the connections properly and causes a leak.

@SuperQ
Copy link
Member

SuperQ commented Jun 18, 2018

One thing I discovered so far is that the supervisord collector leaks a goroutine every scrape. I've tried adding some explicit RPC Close() calls, but I'm starting to think the source of the bug is in the upstream library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants