-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: maybe fix flaky test #8457
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @vpapavas - can you add a description about why you think this fixes the test?
Thank you @agavra ! I updated the description. Fingers crossed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Vicky - this is a real bug, right? Not just a testing one? We should be able to confirm it by adding a sleep somewhere to recreate the race.
@@ -78,9 +76,23 @@ public void setQueryHandle(final QueryHandle queryHandle, final boolean isPullQu | |||
this.queryId = queryHandle.getQueryId(); | |||
this.queue.setQueuedCallback(this::maybeSend); | |||
this.queue.setLimitHandler(() -> { | |||
if (isPullQuery) { | |||
if (queryHandle.getConsistencyOffsetVector().isPresent()) { | |||
log.info("Limit handler: Add consistency token to queue"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is production code, so this logging will be logged in production. Is this helpful to have? If so, then let's at least include the queryID so that it's clear which query it is.
this comment applies to most of these log messages in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logs will only happen if consistency is enabled and currently nobody will enable it since it is not available yet. Moreover, these are there only to debug the problem. Once it doesn't happen anymore, I will remove most of them
if (queryHandle.getConsistencyOffsetVector().isPresent()) { | ||
log.info("Limit handler: Add consistency token to queue"); | ||
} | ||
queryHandle.getConsistencyOffsetVector().ifPresent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we just move this into the if block above and just call .get()
?
@@ -90,9 +102,23 @@ public void setQueryHandle(final QueryHandle queryHandle, final boolean isPullQu | |||
// we should be returning a "Limit Reached" message as we do in the HTTP/1 endpoint when | |||
// we hit the limit, but for query completion, we should just end the response stream. | |||
this.queue.setCompletionHandler(() -> { | |||
if (isPullQuery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the "Justification for duplicated code" comment above, but given this PR do we think it makes sense to just have them use the same code and if we ever change it we can re-intro the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, this code change is quite unrelated to this PR :) Second, I think we should keep it since CP is adding LIMIT
support for pull queries and we would want the completion to be handled differently, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit stuff is already merged :) but feel free to go about it either way
I managed to replicate the problem locally by adding a sleep in the |
4060b8d
to
cc51c19
Compare
* remove logs from test
Description
I added more logs and made a change to the code that adds the consistency token to the queue.
The problem stems from the fact that rows are streamed asynchronously and the limit handler is also set asynchronously. Hence there is a race on whether the limit handler is called first or not.
The code path where this was happening before would not be called if the limit handler was called first on an empty queue. So, what might be happening is that we poll the last row of data from the queue but the complete flag has not been set yet hence the consistency token is not added. Then, the limit handler gets called and sets the complete flag and also initiates complete on the subscriber. Only then, can the consistency token be added but we have already completed the response so it is lost.
It is not failing locally because the code path that is followed is that the limit handler is called first and sets the complete flag to true but the queue is not empty. Then, we poll from the queue which makes the queue empty. Now, since the complete flag is true, the consistency token is added to the queue as well.
Testing done
Describe the testing strategy. Unit and integration tests are expected for any behavior changes.
Reviewer checklist