feat: Adds Scalable Push Query physical operators #7430

AlanConfluent · 2021-04-24T00:37:49Z

Description

This adds the Scalable Push Query physical operators. It introduces an operator PeekStreamOperator which registers a ProcessingQueue with the ScalablePushRegistry. This new operator is combined with existing pull query operators such as ProjectOperator and SelectOperator to create a full query execution. These operators are created with newly introduced PushPhysicalPlanBuilder, which then creates a PushPhysicalPlan which actually does the execution.

Note that PushPhysicalPlan executes async on a Vert.x Context. The idea is that all of the passing of rows doesn't require any dedicated threads, and so many requests can be executing at once and be long-running without taxing threadpools.

Also, this PR includes moving those common operators that are now used by both pull and push to a common package.

Testing done

Ran unit tests.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

cprasad1

LGTM @AlanConfluent. Just a few questions

cprasad1 · 2021-04-29T04:01:29Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+  }
+
+  private QueryId uniqueQueryId() {
+    return new QueryId("query_" + System.currentTimeMillis());


Can we make the query ID convey some information about the 'topology' of the query and make the ID richer?

+1. We can include some data source context here.

I changed the prefix to SCALABLE_PUSH_QUERY_.

The only types of queries that contain anything more to the query id are persistent queries because they're user visible in that case. E.g. normal push queries are here: https://github.com/confluentinc/ksql/blob/master/ksqldb-engine/src/main/java/io/confluent/ksql/engine/QueryIdUtil.java#L103. At least at the moment, for pull and push queries, QueryIds aren't used in any meaningful way. I'm happy to try to pipe in some meaningful info like the source name if we'll use it somewhere.

cprasad1 · 2021-04-29T04:12:57Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+      if (currentLogicalNode instanceof PullProjectNode) {
+        currentPhysicalOp = translateProjectNode((PullProjectNode)currentLogicalNode);
+      } else if (currentLogicalNode instanceof PullFilterNode) {
+        currentPhysicalOp = translateFilterNode((PullFilterNode) currentLogicalNode);
+      } else if (currentLogicalNode instanceof DataSourceNode) {


Should we start naming PullProjectNode and PullFilterNode to TransientProjectNode and TransientFilterNode in the codebase now that they are being used in different types of queries? This would make the naming more generic

Called it QueryProjectNode, per offline discussion.

cprasad1 · 2021-04-29T04:14:53Z

...gine/src/main/java/io/confluent/ksql/physical/scalablepush/operators/PeekStreamOperator.java

+  @Override
+  public boolean droppedRows() {
+    return processingQueue.hasDroppedRows();
+  }


What is our intent exactly when we detect that some rows were dropped?

Our intent is that we'll throw an error and stop the push query. This effectively means that the requester is reading too slowly.

guozhangwang · 2021-04-29T20:03:38Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+import java.util.Objects;
+
+/**
+ * Traverses the logical plan top-down and creates a physical plan for pull queries.


nit typo: push queries in a few places on comments.

Ah, good catch. Updated.

guozhangwang · 2021-04-29T20:32:43Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+      }
+      prevPhysicalOp = currentPhysicalOp;
+      // Exit the loop when a leaf node is reached
+      if (currentLogicalNode.getSources().isEmpty()) {


Just curious, is it true that only DataSourceNode falls in this case today?

Yeah. Once you get to the DataSourceNode, that should be it, at least in our current setup.

guozhangwang · 2021-04-29T20:54:00Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+  }
+
+  private QueryId uniqueQueryId() {
+    return new QueryId("query_" + System.currentTimeMillis());


+1. We can include some data source context here.

guozhangwang · 2021-04-29T20:56:11Z

...gine/src/main/java/io/confluent/ksql/physical/scalablepush/operators/PeekStreamOperator.java

+
+  private final DataSourceNode logicalNode;
+  private final ScalablePushRegistry scalablePushRegistry;
+  private final ProcessingQueue processingQueue;


I did not find these two classes in the PR?

Nevermind, just realized it was not against master :) will review the other 7424.

Now that first PR is merged, so hopefully it's easier to deal with.

guozhangwang · 2021-04-29T22:38:34Z

ksqldb-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlan.java

+    return schema;
+  }
+
+  public ScalablePushRegistry getScalablePushRegistry() {


This getter, and hence the member field, seems not used anywhere in this or the other PR?

It's used in another PR.

guozhangwang

Other than the above comments, LGTM.

My only question is around when / how the PushPhysicalPlanBuilder constructor would be called and how we pick which PersistentQueryMetadata parameter to pass in. I will read for the next PR :)

AlanConfluent

My only question is around when / how the PushPhysicalPlanBuilder constructor would be called and how we pick which PersistentQueryMetadata parameter to pass in. I will read for the next PR

Yes, it's coming next!

AlanConfluent · 2021-05-12T22:39:26Z

ksqldb-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlan.java

+    return schema;
+  }
+
+  public ScalablePushRegistry getScalablePushRegistry() {


It's used in another PR.

AlanConfluent · 2021-05-12T22:40:25Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+import java.util.Objects;
+
+/**
+ * Traverses the logical plan top-down and creates a physical plan for pull queries.


Ah, good catch. Updated.

AlanConfluent · 2021-05-12T23:01:42Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+      }
+      prevPhysicalOp = currentPhysicalOp;
+      // Exit the loop when a leaf node is reached
+      if (currentLogicalNode.getSources().isEmpty()) {


Yeah. Once you get to the DataSourceNode, that should be it, at least in our current setup.

AlanConfluent · 2021-05-12T23:23:51Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+  }
+
+  private QueryId uniqueQueryId() {
+    return new QueryId("query_" + System.currentTimeMillis());


I changed the prefix to SCALABLE_PUSH_QUERY_.

The only types of queries that contain anything more to the query id are persistent queries because they're user visible in that case. E.g. normal push queries are here: https://github.com/confluentinc/ksql/blob/master/ksqldb-engine/src/main/java/io/confluent/ksql/engine/QueryIdUtil.java#L103. At least at the moment, for pull and push queries, QueryIds aren't used in any meaningful way. I'm happy to try to pipe in some meaningful info like the source name if we'll use it somewhere.

AlanConfluent · 2021-05-12T23:24:28Z

...gine/src/main/java/io/confluent/ksql/physical/scalablepush/operators/PeekStreamOperator.java

+
+  private final DataSourceNode logicalNode;
+  private final ScalablePushRegistry scalablePushRegistry;
+  private final ProcessingQueue processingQueue;


Now that first PR is merged, so hopefully it's easier to deal with.

AlanConfluent · 2021-05-12T23:25:12Z

...gine/src/main/java/io/confluent/ksql/physical/scalablepush/operators/PeekStreamOperator.java

+  @Override
+  public boolean droppedRows() {
+    return processingQueue.hasDroppedRows();
+  }


Our intent is that we'll throw an error and stop the push query. This effectively means that the requester is reading too slowly.

AlanConfluent · 2021-05-12T23:59:05Z

...db-engine/src/main/java/io/confluent/ksql/physical/scalablepush/PushPhysicalPlanBuilder.java

+      if (currentLogicalNode instanceof PullProjectNode) {
+        currentPhysicalOp = translateProjectNode((PullProjectNode)currentLogicalNode);
+      } else if (currentLogicalNode instanceof PullFilterNode) {
+        currentPhysicalOp = translateFilterNode((PullFilterNode) currentLogicalNode);
+      } else if (currentLogicalNode instanceof DataSourceNode) {


Called it QueryProjectNode, per offline discussion.

guozhangwang · 2021-05-13T05:35:00Z

@AlanConfluent maybe rebase the PR? :)

AlanConfluent · 2021-05-13T23:33:11Z

@AlanConfluent maybe rebase the PR? :)

@guozhangwang I had previous had this pointed a branch with the previous PR, but have since rebased on master after merging it. Should be good now.

AlanConfluent requested a review from a team as a code owner April 24, 2021 00:37

AlanConfluent changed the base branch from master to scalable_push_queries_registry April 24, 2021 00:39

cprasad1 approved these changes Apr 29, 2021

View reviewed changes

guozhangwang reviewed Apr 29, 2021

View reviewed changes

guozhangwang approved these changes May 10, 2021

View reviewed changes

feat: Adds Scalable Push Query physical operators

b533453

AlanConfluent commented May 13, 2021

View reviewed changes

feedback

846e7c0

AlanConfluent force-pushed the scalable_push_queries_physical_planner branch from 421930d to 846e7c0 Compare May 13, 2021 00:00

AlanConfluent requested a review from JimGalasyn as a code owner May 13, 2021 00:00

JimGalasyn approved these changes May 13, 2021

View reviewed changes

AlanConfluent changed the base branch from scalable_push_queries_registry to master May 13, 2021 16:44

Fix call to queue

706acfa

AlanConfluent merged commit 100767d into confluentinc:master May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adds Scalable Push Query physical operators #7430

feat: Adds Scalable Push Query physical operators #7430

AlanConfluent commented Apr 24, 2021

cprasad1 left a comment

cprasad1 Apr 29, 2021

guozhangwang Apr 29, 2021

AlanConfluent May 12, 2021

cprasad1 Apr 29, 2021

AlanConfluent May 12, 2021

cprasad1 Apr 29, 2021

AlanConfluent May 12, 2021

guozhangwang May 17, 2021

guozhangwang Apr 29, 2021

AlanConfluent May 12, 2021

guozhangwang Apr 29, 2021

AlanConfluent May 12, 2021

guozhangwang Apr 29, 2021

guozhangwang Apr 29, 2021

guozhangwang Apr 29, 2021

AlanConfluent May 12, 2021

guozhangwang Apr 29, 2021

AlanConfluent May 12, 2021

guozhangwang left a comment

AlanConfluent left a comment

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

AlanConfluent May 12, 2021

guozhangwang commented May 13, 2021

AlanConfluent commented May 13, 2021

feat: Adds Scalable Push Query physical operators #7430

feat: Adds Scalable Push Query physical operators #7430

Conversation

AlanConfluent commented Apr 24, 2021

Description

Testing done

Reviewer checklist

cprasad1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

AlanConfluent left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang commented May 13, 2021

AlanConfluent commented May 13, 2021