receive: close DBReadOnly after flushing #1856

blockloop · 2019-12-09T23:13:34Z

We are running thanos receive in production with 1w retention. When the receiver starts up it consumes about 20% memory on each of our 32 nodes. After the initial WAL flush the memory footprint progressively grows over time. I originally reported this in CNCF Slack. I initially tried to delete the WAL and restart the nodes but that didn’t solve the issue. The only solution was to clear out some TSDB blocks

Potentially fixes #1855

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Close the DBReadOnly connection after FlushableStorage.Flush()

Verification

I’ve not been able to verify this just yet. I’ll try to build the image with this patch and see if I see an improvement.

Signed-off-by: Brett Jones <[email protected]>

bwplotka

Nice! While this makes total sense, I doubt this is the potential source of the memory leak, unless you change hashring dynamically and very often.

cc @squat tomorrow (:

blockloop · 2019-12-10T00:19:22Z

Interesting you mention that. We have a watcher that monitors our nodes and updates the hashring.json file. I noticed some changes it picked up a few times. Is it bad for the hashring to update frequently? I have it do a list | sort on the available nodes and write to the hashring.json file.

squat · 2019-12-10T00:37:16Z

nice :)

No, it should absolutely not be a problem to update the file frequently. The thanos receive config watcher checks the hash of the file to check it the content has actually changed; if not, then no update is triggered and no cost is incurred.

On the other hand, if the file is actually changing very frequently, then you will be triggering lots of TSDB flushes, which could be costly.

blockloop force-pushed the master branch 2 times, most recently from ee73b96 to 1543aa2 Compare December 9, 2019 23:16

receive: close DBReadOnly after flushing

f30b36c

Signed-off-by: Brett Jones <[email protected]>

blockloop force-pushed the master branch from 1543aa2 to f30b36c Compare December 9, 2019 23:37

bwplotka approved these changes Dec 10, 2019

View reviewed changes

bwplotka merged commit 352cc30 into thanos-io:master Dec 10, 2019

blockloop mentioned this pull request Dec 10, 2019

Receive: Possible memory leak with TSDB growth #1855

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: close DBReadOnly after flushing #1856

receive: close DBReadOnly after flushing #1856

blockloop commented Dec 9, 2019 •

edited

Loading

bwplotka left a comment

blockloop commented Dec 10, 2019

squat commented Dec 10, 2019

receive: close DBReadOnly after flushing #1856

receive: close DBReadOnly after flushing #1856

Conversation

blockloop commented Dec 9, 2019 • edited Loading

Changes

Verification

bwplotka left a comment

Choose a reason for hiding this comment

blockloop commented Dec 10, 2019

squat commented Dec 10, 2019

blockloop commented Dec 9, 2019 •

edited

Loading