Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: maybe fix flaky test #8457

Merged
merged 4 commits into from
Dec 9, 2021

Conversation

vpapavas
Copy link
Member

@vpapavas vpapavas commented Dec 6, 2021

Description

I added more logs and made a change to the code that adds the consistency token to the queue.

The problem stems from the fact that rows are streamed asynchronously and the limit handler is also set asynchronously. Hence there is a race on whether the limit handler is called first or not.

The code path where this was happening before would not be called if the limit handler was called first on an empty queue. So, what might be happening is that we poll the last row of data from the queue but the complete flag has not been set yet hence the consistency token is not added. Then, the limit handler gets called and sets the complete flag and also initiates complete on the subscriber. Only then, can the consistency token be added but we have already completed the response so it is lost.

It is not failing locally because the code path that is followed is that the limit handler is called first and sets the complete flag to true but the queue is not empty. Then, we poll from the queue which makes the queue empty. Now, since the complete flag is true, the consistency token is added to the queue as well.

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@vpapavas vpapavas requested a review from a team as a code owner December 6, 2021 18:25
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @vpapavas - can you add a description about why you think this fixes the test?

@agavra agavra requested a review from a team December 6, 2021 19:25
@vpapavas
Copy link
Member Author

vpapavas commented Dec 7, 2021

Thank you @agavra ! I updated the description. Fingers crossed

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Vicky - this is a real bug, right? Not just a testing one? We should be able to confirm it by adding a sleep somewhere to recreate the race.

@@ -78,9 +76,23 @@ public void setQueryHandle(final QueryHandle queryHandle, final boolean isPullQu
this.queryId = queryHandle.getQueryId();
this.queue.setQueuedCallback(this::maybeSend);
this.queue.setLimitHandler(() -> {
if (isPullQuery) {
if (queryHandle.getConsistencyOffsetVector().isPresent()) {
log.info("Limit handler: Add consistency token to queue");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is production code, so this logging will be logged in production. Is this helpful to have? If so, then let's at least include the queryID so that it's clear which query it is.

this comment applies to most of these log messages in this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs will only happen if consistency is enabled and currently nobody will enable it since it is not available yet. Moreover, these are there only to debug the problem. Once it doesn't happen anymore, I will remove most of them

if (queryHandle.getConsistencyOffsetVector().isPresent()) {
log.info("Limit handler: Add consistency token to queue");
}
queryHandle.getConsistencyOffsetVector().ifPresent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we just move this into the if block above and just call .get()?

@@ -90,9 +102,23 @@ public void setQueryHandle(final QueryHandle queryHandle, final boolean isPullQu
// we should be returning a "Limit Reached" message as we do in the HTTP/1 endpoint when
// we hit the limit, but for query completion, we should just end the response stream.
this.queue.setCompletionHandler(() -> {
if (isPullQuery) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the "Justification for duplicated code" comment above, but given this PR do we think it makes sense to just have them use the same code and if we ever change it we can re-intro the duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this code change is quite unrelated to this PR :) Second, I think we should keep it since CP is adding LIMIT support for pull queries and we would want the completion to be handled differently, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit stuff is already merged :) but feel free to go about it either way

@vpapavas
Copy link
Member Author

vpapavas commented Dec 8, 2021

I managed to replicate the problem locally by adding a sleep in the LimitHandler before it sets the complete flag which confirms the hypothesis of why it was happening. I went ahead and removed all of the log statements since they were only needed for debugging

@vpapavas vpapavas merged commit acd80ef into confluentinc:master Dec 9, 2021
vpapavas added a commit to vpapavas/ksql that referenced this pull request Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants