Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support aliases in PARTITION BY and GROUP BY #4952

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Apr 1, 2020

Description

fixes: #4881
fixes: #4813

The commit adds support for aliases in GROUP BY and PARTITION BY clauses. For example:

-- input schema: A -> B, C
CREATE STREAM OUTPUT AS 
   SELECT A, B FROM INPUT 
   PARTITION BY C AS D;
-- output schema: D -> A, B.  (where D contains same data as C from input)

-- input schema: A -> B, C
CREATE STREAM TABLE AS 
   SELECT COUNT(1) AS COUNT FROM INPUT
   GROUP BY C AS D;
-- output schema: D -> COUNT.  (where D contains same data as C from input)

This is particularly useful where the new key column name would be generated, e.g.

-- without alias:
-- input schema: A -> B, C
CREATE STREAM OUTPUT AS 
   SELECT A, B FROM INPUT 
   PARTITION BY UDF(C);
-- output schema: KSQL_COL_0 -> A, B

-- with alias:
-- input schema: A -> B, C
CREATE STREAM OUTPUT AS 
   SELECT A, B FROM INPUT 
   PARTITION BY UDF(C) AS D;
-- output schema: D -> A, B

PARTITION BY only supports a single value expression, so aliasing on this is as you'd expect PARTITION BY X as Y.

GROUP BY supports multiple value expressions, e.g. GROUP BY X, Y. This will currently create a new STRING KEY with a generated name e.g. KSQL_COL_0. The commit allows the full set of columns to be aliased, e.g. GROUP BY (X, Y) AS Z. This is an interim solution. Once ksqlDB supports multiple key columns, the ability to alias the set of columns will be removed, and instead support will be added to alias each expression in the GROUP BY e.g. GROUP BY W as X, Y as Z.

Testing done

usual

How to review.

Review the first commit which:

  • updates the grammar to support aliases on GROUP BY and PARTITION BY
  • adds a PartitionBy much like the existing GroupBy AstNode.

Review the second commit which wires in the aliases. Flow is:

  • tweaks the grammar to support aliasing of single GROUP BY.
  • Adds an optional alias to GroupBy and PartitionBy.
  • Updates SqlFormatter to know about aliases.
  • Changes Analysis to take the whole GroupBy and PartitionBy nodes, rather than extract info out of them.
  • Changes the LogicalPlanner to use the alias when building schemas and to pass it along where needed.
  • Changes the existing StreamsSelectKey query plan step, used for PARTITION BY, to have an optional alias. (Backwards compatible as its optional)
  • Changes the existing StreamsGroupBy and TableGroupBy query plan steps, used by GROUP BY, to have an optional alias. (Backwards compatible as its optional)
  • Updates the builders for the above steps to do the right thing with regards to the new optional alias.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@big-andy-coates big-andy-coates requested a review from a team as a code owner April 1, 2020 12:26
Conflicting files
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/Analyzer.java
ksqldb-engine/src/main/java/io/confluent/ksql/structured/SchemaKStream.java
ksqldb-engine/src/test/java/io/confluent/ksql/structured/SchemaKStreamTest.java
Conflicting files
ksqldb-engine/src/main/java/io/confluent/ksql/engine/rewrite/AstSanitizer.java
ksqldb-engine/src/main/java/io/confluent/ksql/planner/LogicalPlanner.java
ksqldb-streams/src/main/java/io/confluent/ksql/execution/streams/GroupByParamsFactory.java
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I did a pretty quick scan as it's a large PR but it seems pretty striaghtforward. If there's anything specific you want me to look at let me know

Let's not forget the docs PR after this :)

.getColumnName();

if (node.getAlias().get().equals(groupByColName)) {
// Alias is a no-op - remove it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering, why not just go through with it?

@big-andy-coates
Copy link
Contributor Author

Docs will be covered by: #4686 in a single pass.

@big-andy-coates big-andy-coates merged commit 7abab48 into confluentinc:master Apr 8, 2020
@big-andy-coates big-andy-coates deleted the partition_and_group_by_aliasing branch April 8, 2020 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support aliases in GROUP BY Support aliasing in PARTITION BY
2 participants