Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize key-range queries in pull queries (#6105) #7993

Merged
merged 1 commit into from
Sep 6, 2021

Conversation

patrickstuedi
Copy link
Contributor

Description

Use range interface in state store for range pull queries, instead of doing a table scan

Testing done

no testing yet

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@patrickstuedi patrickstuedi requested a review from a team as a code owner August 12, 2021 14:58
@patrickstuedi
Copy link
Contributor Author

Not ready for review

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 10 times, most recently from b1df1da to 776d97a Compare August 18, 2021 12:41
Copy link
Member

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @patrickstuedi !

I has a few questions for you.

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 13 times, most recently from 2f7de81 to a3d59b4 Compare August 24, 2021 14:07
Copy link
Member

@AlanConfluent AlanConfluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just had a handful of comments.

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @patrickstuedi! Reviewed it out of interest :) left some comments, none are blockers

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 13 times, most recently from 10838f5 to 6cbccba Compare August 31, 2021 20:09
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! You should wait for @AlanConfluent or @vvcephei's +1 as well as they have a bit more context into the code than I do.

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 2 times, most recently from 81476e0 to 194f27d Compare September 1, 2021 12:39
@@ -447,10 +444,49 @@ public Void visitComparisonExpression(
final Object key = resolveKey(other, col.get(), metaStore, ksqlConfig, node);
keyContents[col.get().index()] = key;
seenKeys.set(col.get().index());
operatorType = node.getType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying that currently, if we have a multi-column key, we can't do table-scan queries today? It seems like we ought to have a test for that kind of thing. It also seems like that restriction isn't strictly necessary.

I didn't meant to imply that. Table scan should work for this today. The validator above will check to see if any operation isn't an "=" and potentially make it a scan if it is. Currently, only if they're all "=" will it skip a table scan and do a key lookup. To handle range queries, this all still holds true, but we can additionally carve out the case of a single key column with range operator from the cases that were being handled by table scan, and now handle that by range query.

@@ -447,10 +444,49 @@ public Void visitComparisonExpression(
final Object key = resolveKey(other, col.get(), metaStore, ksqlConfig, node);
keyContents[col.get().index()] = key;
seenKeys.set(col.get().index());
operatorType = node.getType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current code, if you have multiple key columns, will not behave correctly for a query like this:

  • KEY1 < 10 AND KEY2 = 5

operatorType will be set to the latest operator seen in the AST traversal, which is equals. It will, based on that, assume they're all equality, which was a simplifying assumption we could make before because we only allowed equality, but now isn't quite accurate. I think the current code would just do an equality lookup rather than range operator on one column. If we see this intermixing of operators, we should just fall back on table scan.

To keep track of that, I think we need to keep track of all of the operators. Only if they're all "=" can the lookup operator be equal. I think intermixing is a table scan and if it's a single column key, then range operators can result in lookup operators that are range. My comments above were meant to say that even if we have all "<" for a multi column key, we wouldn't want to do a range scan operation since it wouldn't make sense.

At the moment multi key columns only work for equality, so other operators should throw an error in that case.

I should have said fall back on table scan.

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 3 times, most recently from 2d78bd7 to 870ffbc Compare September 2, 2021 22:28
@AlanConfluent
Copy link
Member

I think I commented on a version that is the same as your latest, but not after your rebase by mistake.

This looks good. I don't think you handled windowed tables yet. If you're bold, you can try sticking it in this PR, but it would also be fine to do it in a followup.

@patrickstuedi patrickstuedi force-pushed the range_query1 branch 5 times, most recently from 77798d4 to 4940bb8 Compare September 3, 2021 14:40
@patrickstuedi patrickstuedi merged commit 22a79bc into confluentinc:master Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants