-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes in the AST node to support the struct fields correctly. #4
base: Query-Rewrite-Feature-Using-AST
Are you sure you want to change the base?
Changes in the AST node to support the struct fields correctly. #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have an ambiguity here that we need to clarify and document the behavior for. What if I have the following stream:
CREATE STREAM S (S STRUCT<A int>, A int) ...
And then the statement
SELECT S.A FROM S
Which A should the expression refer to? I think it would be most intuitive to default to the unqualified field name in this case, but that's just a personal preference. It seems like what's implemented is to default to the qualified name. Which is fine, but we should remember to clarify that in the docs.
Also, I think the current implementation will fail for the expression S.A
if the stream was defined like this
CREATE STREAM S (S STRUCT <A int>)...
, since it assumes that if the first-level identifier in a column reference is the stream/table name then the identifier refers to the stream or table, which may or may not be the case.
); | ||
return new DereferenceExpression(getLocation(context), baseExpression, columnName); | ||
if (columnName.equalsIgnoreCase(dataSourceExtractor.getFromAlias()) | ||
|| columnName.equalsIgnoreCase(dataSourceExtractor.getFromName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the column has the same name as the stream/table? Then we'd return a QualifiedNameReference when we should return a DereferenceExpression right? For starters I think that won't work with the extraction UDF.
@@ -1129,11 +1129,7 @@ public Node visitSubqueryExpression(SqlBaseParser.SubqueryExpressionContext cont | |||
@Override | |||
public Node visitDereference(SqlBaseParser.DereferenceContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my understanding here:
If I have a statement like:
SELECT a.b.c from d
Then I should get something like:
DereferenceExpression(c)->DereferenceExpression(b)->DereferenceExpression(a)->QualifiedNameReference(d)
If DerefenceExpression(f)->e means a dereference of field f in expression e
@@ -225,6 +226,31 @@ public void testBooleanLogicalExpression() { | |||
|
|||
} | |||
|
|||
@Test | |||
public void shouldParseStructFieldAccessCorrectly() { | |||
String simpleQuery = "SELECT iteminfo.category.name, address.street FROM orders WHERE address.state = 'CA';"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final
@@ -61,6 +65,17 @@ public DataSourceExtractor(final MetaStore metaStore) { | |||
this.metaStore = metaStore; | |||
} | |||
|
|||
public java.util.Map<String, String> getAliasToNameMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this being used here or in any other PRs
IMHO, this should result in a parsing error, asking the user to use a different stream name. This avoids the ambiguity. Anything else will leave users confused and bugs in their applications. We, (as in @hjafarpour), should definitely add tests covering such a query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hjafarpour - thanks.
Aside from the bit about stream name matching a struct column's name, it LGTM.
Statement statement = KSQL_PARSER.buildAst(simpleQuery, metaStore).get(0); | ||
|
||
|
||
Assert.assertTrue("testSimpleQuery fails", statement instanceof Query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(statement, is(instanceOf(Query.class)));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the repeated "testSimpleQuery fails"
too please?
Query query = (Query) statement; | ||
assertThat("testSimpleQuery fails", query.getQueryBody(), instanceOf(QuerySpecification.class)); | ||
QuerySpecification querySpecification = (QuerySpecification)query.getQueryBody(); | ||
assertThat("testSimpleQuery fails", querySpecification.getSelect().getSelectItems().size(), equalTo(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use hasSize(2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find this method!
This PR makes changes in our AST so we can handle dot notation to access nested fields in struct correctly.
Previously we assumed that we have only one dot,
stream/table name
.column name
. How we will be able to have more than one dot in a dereferenced node.This is based on the previous PR for Query rewrite.