Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: High memory usage after update to 0.17.0 #3471

Closed
mxmorin opened this issue Nov 19, 2020 · 10 comments
Closed

receive: High memory usage after update to 0.17.0 #3471

mxmorin opened this issue Nov 19, 2020 · 10 comments

Comments

@mxmorin
Copy link

mxmorin commented Nov 19, 2020

Thanos, Prometheus and Golang version used:
thanos, version 0.17.0 (branch: HEAD, revision: 4899d9a)
build user: root@432c5bd5f94c
build date: 20201118-17:38:06
go version: go1.15
platform: linux/amd64

Object Storage Provider:
S3

What happened:
We have 4 thanos receive. After upgrade to 0.17, used memory increased to 100%

image

@mxmorin
Copy link
Author

mxmorin commented Nov 19, 2020

After downgrade to 0.16
image

@Danipiario
Copy link

Also for me. I have the same memory consumption with 0.17.0 version. Revert to 0.16.0 fix the growth.

@kakkoyun kakkoyun changed the title [RECEIVE] High memory usage after update to 0.17.0 receive: High memory usage after update to 0.17.0 Dec 4, 2020
@mxmorin
Copy link
Author

mxmorin commented Jan 11, 2021

I've retried to upgrade to 0.17 but same effect

  • yum upgrade thanos
  • systemctl restart thanos-query
  • systemctl restart thanos-store
  • systemctl stop thanos-receive
  • systemctl start thanos-receive

image

Is this issue active because we can't upgrade to latest versions ?

@aricamf
Copy link

aricamf commented Jan 20, 2021

The same for me, I used 0.17.2 and met the exact same problem.

@mxmorin
Copy link
Author

mxmorin commented Feb 2, 2021

Same effect with new 0.18.0 release
image

@svenwltr
Copy link

svenwltr commented Feb 4, 2021

Hello! It looks like we have the same problem:

image

I tired some profiling (following #325 (comment)). What I did:

  1. Get binary to my machine: kubectl cp sts/thanos-receive:/bin/thanos ./thanos
  2. Forward HTTP port: kubectl port-forward sts/thanos-receive 10902:10902
  3. Run go tool pprof -symbolize=remote -alloc_space ./thanos "http://localhost:10902/debug/pprof/heap" and top 10:
(pprof) top 10
Showing nodes accounting for 421.25GB, 96.30% of 437.43GB total
Dropped 572 nodes (cum <= 2.19GB)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
  199.27GB 45.55% 45.55%   199.27GB 45.55%  github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal
   80.06GB 18.30% 63.86%    80.06GB 18.30%  github.com/golang/snappy.Decode
   75.49GB 17.26% 81.11%    77.47GB 17.71%  github.com/thanos-io/thanos/pkg/receive.(*Writer).Write
   25.18GB  5.76% 86.87%    25.18GB  5.76%  bytes.makeSlice
   18.15GB  4.15% 91.02%    18.64GB  4.26%  github.com/thanos-io/thanos/pkg/receive.(*Handler).forward
   18.07GB  4.13% 95.15%   217.34GB 49.69%  github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*WriteRequest).Unmarshal
    2.71GB  0.62% 95.77%     2.71GB  0.62%  github.com/prometheus/prometheus/tsdb/encoding.(*Decbuf).UvarintStr (inline)
    1.97GB  0.45% 96.22%     4.62GB  1.06%  github.com/prometheus/prometheus/tsdb/record.(*Decoder).Series
    0.33GB 0.076% 96.29%     2.87GB  0.66%  github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
    0.03GB 0.0064% 96.30%     2.32GB  0.53%  github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6

Hopefully this helps a bit. If there is something more that I can profile, please tell me.


Edit:

❯ ./thanos --version
thanos, version 0.18.0 (branch: HEAD, revision: 60d45a02d46858a38013283b578017a171cf7b82)
  build user:       circleci@8ddf80c1eb30
  build date:       20210127-12:29:07
  go version:       go1.15.7
  platform:         linux/amd64

@flarno11
Copy link

We've upgraded to 0.17.2 to fix the WAL replay issues, but now we are also facing a dramatic increase of memory usage:
Screenshot 2021-02-12 at 13 33 01

We had to double the system memory from 64GB to 128GB to stop OOM kills.

@kakkoyun
Copy link
Member

Has anyone tested their stack against the latest stable v0.18.0? Do still have this issue? (probably we have just to make sure)

@svenwltr
Copy link

Has anyone tested their stack against the latest stable v0.18.0? Do still have this issue? (probably we have just to make sure)

Yes. See thanos --version #3471 (comment) at the bottom.

@bwplotka
Copy link
Member

Let's join discussions. We have a duplicate, We are investigating on #3726 more 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants