Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness and readiness probes are failing with 503 in compactor and prometheus after upgrading to 0.17.0 #3466

Closed
dimm0 opened this issue Nov 18, 2020 · 7 comments

Comments

@dimm0
Copy link

dimm0 commented Nov 18, 2020

Thanos, Prometheus and Golang version used:

Thanos 0.17.0, prometheus 2.22.2

Object Storage Provider:
ceph S3

What happened:
After upgrading thanos to 0.17.0, the liveness probes in prometheus and thanos-compactor pods start failing with 503 error, which causes pods to restart

What you expected to happen:
Pods keep running

How to reproduce it (as minimally and precisely as possible):
Not sure what triggered it, I simply upgraded a working installation

Full logs to relevant components:
Couldn't get any error in logs since container simply restarts, no errors logged

@dimm0
Copy link
Author

dimm0 commented Nov 18, 2020

The problem is gone when reverted to 0.16.0

@kesor
Copy link

kesor commented Nov 24, 2020

When running thanos compact --http-address=0.0.0.0:10902 it doesn't listen, checked with netstat -na | grep LISTEN by exec into the container running compact.

@Kampe
Copy link

Kampe commented Nov 30, 2020

I see the same issues with using thanos store in v0.16, did not see in v0.15

@ahurtaud
Copy link
Contributor

ahurtaud commented Dec 2, 2020

We have the same issue I think.
What is weird is:
For some buckets it is working fine, for some other, it gets killed by the liveness probe.

bucket config: Scality S3

Fails on 0.17.1 and works on 0.16.0

@ahurtaud
Copy link
Contributor

ahurtaud commented Dec 2, 2020

dupe of #3395 ?

@bwplotka
Copy link
Member

bwplotka commented Dec 2, 2020

Yup, that looks like the same issue.

Thank you all for reporting this, let's fix and release 0.17.2 (:

Also commented on the mentioned issue. Closing this one as dup 🤗

@bwplotka bwplotka closed this as completed Dec 2, 2020
@Kampe
Copy link

Kampe commented Dec 3, 2020

@bwplotka I am seeing the same behavior with this issue in thanos-store with an upgrade to 0.17 as well, while not the same component - pods will be end up redeploying after liveness checks fail for no apparent reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants