Thanos query timeout #440

shirpx · 2018-07-24T10:21:23Z

Hi,
i connected LB to my thanos-query servers as a datasource in grafana,
when trying to query 7 days back on redis dashboard
i see that the query servers cpu raises but not enough for it to die,
but after that the datasource just becomes inaccessible and i can’t query at all,
the datasource becomes accessible again only after restarting the thanos-query container on the query servers(never get’s back to work without a reset)

thanos query graphs:

timeout directly throw thanos http access:

it’s important to mention that my prometheus servers can handle 7 days on the redis dashboard when querying them directly and it’s looks like thanos-query uses more prometheus resources than querying directly.
redisDashboard.txt

i have 2 prometheus servers scarping in the region i am querying on and in the dashboard i’m querying only for that region.

the resources of my query severs and my prometheus servers in that region are equal (for comparison purposes due this issue)

my query servers resources: min 2 servers with 12 vCPUs, 76 GB (autoscaled)
my prometheus servers resources:
2 servers with 12 vCPUs, 76 GB

in the query logs i see this error:
level=error ts=2018-07-24T08:24:50.715483428Z caller=proxy.go:117 err=“fetch series for [{monitor codelab-monitor} {replica prometheus-master-us-central1-a-1}]: rpc error: code = Canceled desc = context canceled”

nothing is special in the sidecar logs

bradleybluebean · 2019-09-10T00:28:58Z

Not sure if this helps but I posted a comment on a similar sounding issue
#455 (comment)

krasi-georgiev · 2019-09-13T09:48:25Z

Cold you try with the latest Thanos version as Bartek added remote read streaming and it should fix this problem.
#1268

krasi-georgiev · 2019-09-13T10:02:56Z

Just noticed that the streaming PR got merged just after 2.12 was cut so need to wait for the 2.13 Prometheus release or just use the master image.
https://github.com/prometheus/prometheus/commits/master?after=26e8d25e0b0d3459e5901805de992acf1d5eeeaa+34

bwplotka · 2019-09-13T10:36:09Z

Wait or use our image we prepared Which is essentially 2.12 + remote read extended protocol: `quay.io/thanos/prometheus:v2.12.0-rc.0-rr-streaming` It's used in production already. (:

…

On Fri, 13 Sep 2019 at 11:03, Krasi Georgiev ***@***.***> wrote: Just noticed that the streaming PR got merged just after 2.12 was cut so need to wait for the 2.13 Prometheus release or just use the master image. https://github.com/prometheus/prometheus/commits/master?after=26e8d25e0b0d3459e5901805de992acf1d5eeeaa+34 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#440?email_source=notifications&email_token=ABVA3O7RGBH7POKAEYEP4ALQJNQNLA5CNFSM4FLR4SDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6USNZA#issuecomment-531179236>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABVA3O2UXHRXUGENWZPPOFLQJNQNLANCNFSM4FLR4SDA> .

stale · 2020-01-11T05:42:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the stale label Jan 11, 2020

stale bot closed this as completed Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos query timeout #440

Thanos query timeout #440

shirpx commented Jul 24, 2018 •

edited

Loading

bradleybluebean commented Sep 10, 2019

krasi-georgiev commented Sep 13, 2019

krasi-georgiev commented Sep 13, 2019

bwplotka commented Sep 13, 2019 via email

stale bot commented Jan 11, 2020

Thanos query timeout #440

Thanos query timeout #440

Comments

shirpx commented Jul 24, 2018 • edited Loading

bradleybluebean commented Sep 10, 2019

krasi-georgiev commented Sep 13, 2019

krasi-georgiev commented Sep 13, 2019

bwplotka commented Sep 13, 2019 via email

stale bot commented Jan 11, 2020

shirpx commented Jul 24, 2018 •

edited

Loading