-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: support complex key pull queries #6628
Conversation
@@ -1927,17 +1914,90 @@ | |||
} | |||
}, | |||
{ | |||
"name": "IN: fail on non-literal key", | |||
"name": "non-windowed - function in where clause", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support generic expressions in pull query WHERE clauses at this time? Last I spoke to @AlanConfluent it sounded like we wanted to limit support to literal expressions (i.e., those that create arrays and structs with literal values) for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be more work to not support this than to support this. Is there any reason why we don't want to?
], | ||
"expectedError": { | ||
"type": "io.confluent.ksql.rest.entity.KsqlStatementErrorMessage", | ||
"message": "Only comparison to literals is currently supported: (ID IN (CAST(1 AS INTEGER)))", | ||
"message": "Unsupported column reference in pull query: (COUNT + 1)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be supported is functions on literals, maps and arrays of literals but not functions on columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, essentially anything static (column references can't be used anywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. In this PR we only want to support non-literal keys (maps and structs) and expressions that use only literals.
"statements": [ | ||
"CREATE STREAM INPUT (ID INT KEY, IGNORED INT) WITH (kafka_topic='test_topic', value_format='JSON');", | ||
"CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT GROUP BY ID;", | ||
"SELECT * FROM AGGREGATE WHERE ID IN (CAST(10 AS INTEGER));" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe make the IN take more than one key since that is its primary use case? Something like:
SELECT * FROM AGGREGATE WHERE ID IN (CAST(10 AS INTEGER), CAST("11" AS INTEGER));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comment about a test case
if (exp instanceof NullLiteral) { | ||
obj = null; | ||
} else if (exp instanceof Literal) { | ||
// skip the GenericExpressionResolver because this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just done because GenericExpressionResolver
isn't as low overhead right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, that was exactly the idea
@@ -76,11 +83,12 @@ public Object resolve(final Expression expression) { | |||
|
|||
@Override | |||
protected Object visitExpression(final Expression expression, final Void context) { | |||
new EnsureNoColReferences(expression).process(expression, context); | |||
final ExpressionMetadata metadata = | |||
CodeGenRunner.compileExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, how much overhead is it to compile the java into bytecode, load the bytecode, and then run it. It makes a lot of sense when you compile once and then run many times, but in this case, there's a 1:1 relationship. In general, it seems like interpreting the expression would be lower overhead and possibly faster. Do we do it this way because this is the main method that currently exists for "evaluating a sql expression"?
I'd be curious to run a benchmark doing just expression lookups (rather than literals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline about this, we'll look into running specialized benchmarks to see how much the overhead is. Given that this is just "new" functionality, we won't block the PR on this and we'll run those benchmarks going forward.
Description
fixes #6602
Supports key-lookups for keys that are not just literals. This is necessary for the generic key work being done because we will expose keys that are not just primitives, but can also be arrays and structs.
Note that maps are not tested because of #6621, we may consider just prohibiting maps from being keys altogether. I will port this over after #6375 is merged.
Testing done
Reviewer checklist