Steady memory consumption increase of about 2MB/h #859

xguerin · 2018-03-19T20:00:30Z

Host operating system: output of `uname -a`

Linux xxx.xxx.xxx.xxx.xxx 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 13 10:46:25 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

Image: quay.io/prometheus/node-exporter:v0.15.2

node_exporter command line flags

--web.listen-address=127.0.0.1:9101
--path.procfs=/host/proc
--path.sysfs=/host/sys

Are you running node_exporter in Docker?

Yes, through the kube-prometheus operator.

What did you do that produced an error?

Just let the node_monitor running.

What did you expect to see?

A memory consumption stabilizing around 50MB.

What did you see instead?

An increasing memory consumption of about 2MB/h

The text was updated successfully, but these errors were encountered:

daimon99 · 2018-03-20T06:46:10Z

I had encountered this problem too. Memory leak?

xguerin · 2018-03-20T13:06:29Z

Here is the memory consumption after running overnight:

The rate seems to be lower than initially reported, but there is definitely some memory leaking.

SuperQ · 2018-03-20T16:32:48Z

What do you get for the metric process_resident_memory_bytes? Does it match the container graph?

It would help to get a pprof memory allocation graph.

https://blog.golang.org/profiling-go-programs

grobie · 2018-03-20T18:40:45Z

Our memory consumption with version 0.15.3 is stable, but ranges from 7MB minimum to 60MB maximum across thousands of nodes. We disable the ipvs, xfs and zfs collectors.

xguerin · 2018-03-20T18:48:06Z

Here is the graph for my 4 node-exporter instances over the past day:

EDIT updated graph.

SuperQ · 2018-03-20T21:02:21Z

@xguerin Your screenshot cut off the Y axis values, so I have no idea what it means.

xguerin · 2018-03-20T21:05:04Z

Apologies I did not realize it. I updated the post.

SuperQ · 2018-03-20T21:27:21Z

Those numbers seem reasonable, as @grobie said, the amount used by the exporter depends on which collectors you're using, and how much data they need to gather. There is probably some optimization we could do, but it doesn't seem like a clear memory leak. As you can see in the graph, Go GC is doing some work and reducing the process size from time to time.

pprof samples are what we would need to debug farther, see if there are any specific collectors using up/ leaking memory.

csawyerYumaed · 2018-03-21T21:48:17Z

/usr/sbin/node_exporter --version
node_exporter, version 0.15.0 (branch: HEAD, revision: 4d7aa57)
build user: root@maya
build date: 20171103-04:16:00
go version: go1.8

SuperQ · 2018-03-21T22:04:13Z

@csawyerYumaed We really need a pprof dump in order to figure out what's going on.

You can follow the same basic steps as this blog post but gather the data from the node_exporter rather than Prometheus itself.

csawyerYumaed · 2018-03-21T22:08:44Z

I run that go tool pprof command in the source directory of node_exporter?

brian-brazil · 2018-03-21T22:13:36Z

You don't need the source to run it, so you can run it from anywhere.

csawyerYumaed · 2018-03-21T22:15:04Z

oh DUH. sorry I was asleep at the w heel, it takes the URL as the argument to know where to profile.. I get it. I'll work on it next time it happens (I've already killed that process as it's super annoying)

csawyerYumaed · 2018-03-21T22:50:57Z

I can email the raw SVG somewhere if you want, but github won't let me post it here, and imgur and friends don't accept them either apparently.

$ ps auxww | grep node
node-ex+ 30732 32.5 1.9 2046576 650832 ? Sl 07:16 160:25 /usr/sbin/node_exporter --collector.textfile.directory /var/lib/node_exporter/textfile_collector
$ /usr/sbin/node_exporter --version
node_exporter, version 0.15.0 (branch: HEAD, revision: 0eecaa9)
build user: root@maya
build date: 20171026-03:15:58
go version: go1.8

csawyerYumaed · 2018-03-21T22:55:28Z

I've upgraded to 0.15.2 and 0.15.1 on these nodes, to see if they also leak, if they do I'll post the same information for those versions, respectively.

SuperQ · 2018-03-21T23:10:57Z

It would also be useful to test the 0.16.0 release candidate.

csawyerYumaed · 2018-03-22T00:34:46Z

OK, I didn't see the RC release. I have a node that leaked before on 0.16.0 RC release as well. If it leaks will probably do so overnight, will check on them in ~ 16 hrs or so(from now) will report back.

csawyerYumaed · 2018-03-22T10:12:51Z

Here is the last 24 hrs, you can clearly see when the nodes got upgraded to newer versions. Seems like the memory leak is gone in versions => 0.15.1 (for me at least). Will report back if it happens again.

daimon99 · 2018-03-23T10:40:57Z

Me too.
After 3/21, we do a restart every day to reduce the memory consumption.

node_exporter --collector.supervisord --collector.textfile.directory=/data/monitor/textfile --collector.filesystem.ignored-fs-types="^(nfs4)$"

grobie · 2018-03-23T10:50:33Z

@daimon99 Have you upgraded to the latest versoin?

daenney · 2018-03-29T10:11:58Z

I'm not seeing anything akin to a memory leak on 0.16-rc.0. In the case of my machine it builds up to about 18MB of memory usage and is completely steady after that. I haven't disabled any collector on 0.16, didn't seem to be any need for it.

zerkms · 2018-06-17T21:12:50Z

The same here. Below is the graph for the node_exporter v0.16.0 downloaded from the releases page (the graph covers approximately 2 days of June 16th and 17th).

The corresponding pprof heap dump is here: https://www.dropbox.com/s/d48l0rdco8cl4t7/heap.svg?dl=0

The RSS for the node_exporer process constantly grows and as of this moment it's ~170MiB

The command line we run node_exporter with:

/usr/local/bin/node_exporter --no-collector.arp --no-collector.bcache --no-collector.bonding --no-collector.conntrack --no-collector.edac --no-collector.entropy --no-collector.filefd --no-collector.hwmon --no-collector.infiniband --no-collector.ipvs --no-collector.loadavg --no-collector.mdadm --no-collector.sockstat --no-collector.nfs --no-collector.stat --no-collector.time --no-collector.timex --no-collector.vmstat --no-collector.wifi --no-collector.xfs --no-collector.zfs --no-collector.cpu --collector.supervisord --collector.filesystem.ignored-fs-types=^fuse.lxcfs|tmpfs$ --collector.supervisord.url=http://localhost:9001/RPC2 --collector.textfile.directory=/var/lib/redacted/prometheus/textfile

And this memory usage pattern is not unique: there are few other instances of the node_exporter that behave exactly the same

It looks like it's caused by supervisord collector for the hosts, where supervisord is not installed or is not running. So every time it tries to collect it some memory leaks in the xmlrpc (?) client or in the code that uses it.

SuperQ · 2018-06-18T05:53:36Z

@zerkms Interesting, thanks for the quality bug report. Yes, it looks like the xmlrpc library is leaky, or the supervisord collector doesn't close the connections properly and causes a leak.

SuperQ · 2018-06-18T07:25:57Z

One thing I discovered so far is that the supervisord collector leaks a goroutine every scrape. I've tried adding some explicit RPC Close() calls, but I'm starting to think the source of the bug is in the upstream library.

SuperQ mentioned this issue Apr 18, 2018

node_exporter memory increase #906

Closed

SuperQ mentioned this issue Jun 18, 2018

Fix supervisord collector #978

Merged

This was referenced Jul 18, 2018

Panic and crash when supervisord collector enabled, yet supervisord not listening on 127.0.0.1:9001 #1007

Open

Node-Exporter : memory usage too high (OOME) #1008

Closed

antoniomika mentioned this issue Jul 27, 2018

node_exporter Memory Usage High After Upgrade from 0.15.2 to 0.16.0 #1024

Closed

SuperQ closed this as completed in #978 Aug 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steady memory consumption increase of about 2MB/h #859

Steady memory consumption increase of about 2MB/h #859

xguerin commented Mar 19, 2018

daimon99 commented Mar 20, 2018 •

edited

Loading

xguerin commented Mar 20, 2018

SuperQ commented Mar 20, 2018

grobie commented Mar 20, 2018

xguerin commented Mar 20, 2018 •

edited

Loading

SuperQ commented Mar 20, 2018

xguerin commented Mar 20, 2018

SuperQ commented Mar 20, 2018

csawyerYumaed commented Mar 21, 2018

SuperQ commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

brian-brazil commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

SuperQ commented Mar 21, 2018

csawyerYumaed commented Mar 22, 2018

csawyerYumaed commented Mar 22, 2018

daimon99 commented Mar 23, 2018 •

edited

Loading

grobie commented Mar 23, 2018

daenney commented Mar 29, 2018

zerkms commented Jun 17, 2018 •

edited

Loading

SuperQ commented Jun 18, 2018

SuperQ commented Jun 18, 2018

Steady memory consumption increase of about 2MB/h #859

Steady memory consumption increase of about 2MB/h #859

Comments

xguerin commented Mar 19, 2018

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

daimon99 commented Mar 20, 2018 • edited Loading

xguerin commented Mar 20, 2018

SuperQ commented Mar 20, 2018

grobie commented Mar 20, 2018

xguerin commented Mar 20, 2018 • edited Loading

SuperQ commented Mar 20, 2018

xguerin commented Mar 20, 2018

SuperQ commented Mar 20, 2018

csawyerYumaed commented Mar 21, 2018

SuperQ commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

brian-brazil commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

csawyerYumaed commented Mar 21, 2018

SuperQ commented Mar 21, 2018

csawyerYumaed commented Mar 22, 2018

csawyerYumaed commented Mar 22, 2018

daimon99 commented Mar 23, 2018 • edited Loading

grobie commented Mar 23, 2018

daenney commented Mar 29, 2018

zerkms commented Jun 17, 2018 • edited Loading

SuperQ commented Jun 18, 2018

SuperQ commented Jun 18, 2018

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`

daimon99 commented Mar 20, 2018 •

edited

Loading

xguerin commented Mar 20, 2018 •

edited

Loading

daimon99 commented Mar 23, 2018 •

edited

Loading

zerkms commented Jun 17, 2018 •

edited

Loading