-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't return cached heartbeat read when query service is down to avoi… #4689
Conversation
1745509
to
55d7e3d
Compare
// We return healthy from this as a signal to the healtcheck to attempt | ||
// to start the query service again | ||
if !tsv.hr.IsOpen() && !tsv.IsServing() { | ||
return 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to return something to indicate the reader is closed?
i.e. -1 or max int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return -1, if we return max int, then we will get an unhealthy due to replication lag issue. We could also return a custom error that specifies the reader is closed. We would need to add code to handle -1 or the custom error inside the healthcheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is an improvement already, I'll merge this. We can fix forward as needed.
…d situation where replica tablet never recovers. Signed-off-by: Jacob Schlather <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: <[email protected]>
55d7e3d
to
0d01465
Compare
@sougou Could you take a look? Thank you. |
// We return healthy from this as a signal to the healtcheck to attempt | ||
// to start the query service again | ||
if !tsv.hr.IsOpen() && !tsv.IsServing() { | ||
return 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is an improvement already, I'll merge this. We can fix forward as needed.
…d situation where replica tablet never recovers.
Closes #4673
The core of the problem in 4673, is that the HeartbeatReporter does not obey the contract expected by the healthcheck - that it reports a current reading of the health. I don't see a straightforward way to fix this without rewriting the reader entirely. The next best thing I could see to do was to return healthy when we knew the reader was closed and the query service was shutdown. This will force the healthcheck to attempt to restart the query service and if that fails, then it will report that error and remain unhealthy. I considered putting this logic in the reader, but if we only know the reader is closed then there's the possibility of a pathological situation where the query service is running and the reader is closed and then we would have no visibility into this.
cc @leoxlin