Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in the AST node to support the struct fields correctly. #4

Open
wants to merge 1 commit into
base: Query-Rewrite-Feature-Using-AST
Choose a base branch
from

Conversation

hjafarpour
Copy link
Owner

@hjafarpour hjafarpour commented Jun 8, 2018

This PR makes changes in our AST so we can handle dot notation to access nested fields in struct correctly.
Previously we assumed that we have only one dot, stream/table name.column name. How we will be able to have more than one dot in a dereferenced node.
This is based on the previous PR for Query rewrite.

@hjafarpour hjafarpour self-assigned this Jun 8, 2018
Copy link

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an ambiguity here that we need to clarify and document the behavior for. What if I have the following stream:

CREATE STREAM S (S STRUCT<A int>, A int) ...

And then the statement

SELECT S.A FROM S

Which A should the expression refer to? I think it would be most intuitive to default to the unqualified field name in this case, but that's just a personal preference. It seems like what's implemented is to default to the qualified name. Which is fine, but we should remember to clarify that in the docs.

Also, I think the current implementation will fail for the expression S.A if the stream was defined like this

CREATE STREAM S (S STRUCT <A int>)...

, since it assumes that if the first-level identifier in a column reference is the stream/table name then the identifier refers to the stream or table, which may or may not be the case.

);
return new DereferenceExpression(getLocation(context), baseExpression, columnName);
if (columnName.equalsIgnoreCase(dataSourceExtractor.getFromAlias())
|| columnName.equalsIgnoreCase(dataSourceExtractor.getFromName())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the column has the same name as the stream/table? Then we'd return a QualifiedNameReference when we should return a DereferenceExpression right? For starters I think that won't work with the extraction UDF.

@@ -1129,11 +1129,7 @@ public Node visitSubqueryExpression(SqlBaseParser.SubqueryExpressionContext cont
@Override
public Node visitDereference(SqlBaseParser.DereferenceContext context) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify my understanding here:

If I have a statement like:

SELECT a.b.c from d

Then I should get something like:

DereferenceExpression(c)->DereferenceExpression(b)->DereferenceExpression(a)->QualifiedNameReference(d)

If DerefenceExpression(f)->e means a dereference of field f in expression e

@@ -225,6 +226,31 @@ public void testBooleanLogicalExpression() {

}

@Test
public void shouldParseStructFieldAccessCorrectly() {
String simpleQuery = "SELECT iteminfo.category.name, address.street FROM orders WHERE address.state = 'CA';";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final

@@ -61,6 +65,17 @@ public DataSourceExtractor(final MetaStore metaStore) {
this.metaStore = metaStore;
}

public java.util.Map<String, String> getAliasToNameMap() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being used here or in any other PRs

@big-andy-coates
Copy link

big-andy-coates commented Jun 13, 2018

@rodesai

CREATE STREAM S (S STRUCT<A int>, A int) ...

Which A should the expression refer to? I think it would be most intuitive to default to the unqualified field name in this case, but that's just a personal preference.

IMHO, this should result in a parsing error, asking the user to use a different stream name. This avoids the ambiguity. Anything else will leave users confused and bugs in their applications.

We, (as in @hjafarpour), should definitely add tests covering such a query.

Copy link

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hjafarpour - thanks.

Aside from the bit about stream name matching a struct column's name, it LGTM.

Statement statement = KSQL_PARSER.buildAst(simpleQuery, metaStore).get(0);


Assert.assertTrue("testSimpleQuery fails", statement instanceof Query);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(statement, is(instanceOf(Query.class)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the repeated "testSimpleQuery fails" too please?

Query query = (Query) statement;
assertThat("testSimpleQuery fails", query.getQueryBody(), instanceOf(QuerySpecification.class));
QuerySpecification querySpecification = (QuerySpecification)query.getQueryBody();
assertThat("testSimpleQuery fails", querySpecification.getSelect().getSelectItems().size(), equalTo(2));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use hasSize(2)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find this method!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants