From 19521a1c07a466e2defb8204d08086aee0df9b14 Mon Sep 17 00:00:00 2001 From: cprasad1 Date: Thu, 27 May 2021 13:40:40 -0700 Subject: [PATCH] docs: rewrite pull query docs page (#7604) * chore: fix pull queries in https://docs/concepts/queries.md (#7597) * chore: fix pull queries in docs/concepts/queries.md * nit in language Co-authored-by: Chittaranjan Prasad <> * docs: rewrite pull query docs page (#7532) * chore: rewrite pull query docs page * revise some more * incorporate Alan's comment * explicitly call out features and limitations * Update docs/developer-guide/ksqldb-reference/select-pull-query.md Co-authored-by: Almog Gavra * Update docs/developer-guide/ksqldb-reference/select-pull-query.md Co-authored-by: Almog Gavra * Update docs/developer-guide/ksqldb-reference/select-pull-query.md Co-authored-by: Almog Gavra * Update docs/developer-guide/ksqldb-reference/select-pull-query.md Co-authored-by: Almog Gavra * Update docs/developer-guide/ksqldb-reference/select-pull-query.md Co-authored-by: Almog Gavra * nits Co-authored-by: Chittaranjan Prasad <> Co-authored-by: Almog Gavra Co-authored-by: Almog Gavra --- .../ksqldb-reference/select-pull-query.md | 167 ++++++++++-------- 1 file changed, 97 insertions(+), 70 deletions(-) diff --git a/docs/developer-guide/ksqldb-reference/select-pull-query.md b/docs/developer-guide/ksqldb-reference/select-pull-query.md index 96de2cffa322..c0291e6eef48 100644 --- a/docs/developer-guide/ksqldb-reference/select-pull-query.md +++ b/docs/developer-guide/ksqldb-reference/select-pull-query.md @@ -14,98 +14,125 @@ Synopsis ```sql SELECT select_expr [, ...] - FROM aggregate_table - WHERE key_column=key [AND ...] - [AND window_bounds]; + FROM table + [ WHERE where_condition ] + [ AND window_bounds ]; ``` Description ----------- -Pulls the current value from the materialized table and terminates. The result -of this statement isn't persisted in a {{ site.ak }} topic and is printed out -only in the console. - -Pull queries enable you to fetch the current state of a materialized view. -Because materialized views are incrementally updated as new events arrive, -pull queries run with predictably low latency. They're a great match for -request/response flows. For asynchronous application flows, see +Pulls the current value from the materialized view and terminates. The result +of this statement is not persisted in a {{ site.ak }} topic and is printed out +only in the console. Pull queries run with predictably low latency because +materialized views are incrementally updated as new events arrive. +They are a great match for request/response flows. For asynchronous application flows, see [Push Queries](select-push-query.md). -Execute a pull query by sending an HTTP request to the ksqlDB REST API, and +You can execute a pull query by sending an HTTP request to the ksqlDB REST API, and the API responds with a single response. -The WHERE clause must contain a value for each primary-key column to retrieve and may -optionally include bounds on `WINDOWSTART` and `WINDOWEND` if the materialized table is windowed. -For more information, see -[Time and Windows in ksqlDB](../../concepts/time-and-windows-in-ksqldb-queries.md). +- Pull queries are expressed using a strict subset of ANSI SQL. +- You can issue a pull query against any table that was created by a + [CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement. +- Currently, we do not support pull queries against tables created by using a [CREATE TABLE](../../ksqldb-reference/create-table) statement. +- Pull queries do not support `JOIN`, `PARTITION BY`, `GROUP BY` and `WINDOW` clauses (but can query materialized tables that contain those clauses) + +WHERE Clause Guidelines +----------------------- + +By default, only key lookups are enabled. They have the following requirements: +- Key column(s) must use an equality comparison to a literal (e.g. KEY = 'abc'). +- On windowed tables, WINDOWSTART and WINDOWEND can be optionally compared to literals. + For more information on windowed tables, see [Time and Windows in ksqlDB](../../concepts/time-and-windows-in-ksqldb-queries.md). + +You can loosen the restrictions on the `WHERE` clause, or eliminate the `WHERE` clause altogether, +by enabling table scans in your current CLI session with the command `SET 'ksql.query.pull.table.scan.enabled'='true';`. +Table scans can also be enabled by default by setting a server configuration property with +`ksql.query.pull.table.scan.enabled=true`. Once table scans are enabled, the following additional expressions are allowed: +- Key column(s) using range comparisons to literals. +- Non-key columns to be used alone, without key references. +- Columns to be compared to other columns. +- References to subsets of columns from a multi-column key. +- Complex expressions without direct column references (e.g. instr(NAME_COL, 'hello') > 0). -Example -------- +!!! note + Table scan based queries are just the next incremental step for ksqlDB pull queries. + In future releases, we will continue pushing the envelope of new query capabilities and + greater performance and efficiency. +Examples +-------- +Pull queries against a table `TOP_TEN_RANKS` created by using a +[CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement: +First, create a table named `GRADES` by using a [CREATE TABLE](../../ksqldb-reference/create-table) +statement: ```sql -SELECT * FROM pageviews_by_region - WHERE regionId = 'Region_1' - AND 1570051876000 <= WINDOWSTART AND WINDOWEND <= 1570138276000; +CREATE TABLE GRADES (ID INT PRIMARY KEY, GRADE STRING, RANK INT) + WITH (kafka_topic = 'test_topic', value_format = 'JSON', partitions = 4); ``` - -If the `pageviews_by_region` table was created as an aggregation of multiple columns, -then each key column must be present in the WHERE clause. The following example shows how to -query the table if `countryId` and `regionId` where both key columns: - +Then, create a derived table named `TOP_TEN_RANKS` by using a +[CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement: ```sql -SELECT * FROM pageviews_by_region - WHERE countryId = 'USA' AND regionId = 'Region_1' - AND 1570051876000 <= WINDOWSTART AND WINDOWEND <= 1570138276000; +CREATE TABLE TOP_TEN_RANKS + AS SELECT ID, RANK + FROM GRADES + WHERE RANK <= 10; ``` - -When writing logical expressions using `WINDOWSTART` or `WINDOWEND`, you can use ISO-8601 -formatted datestrings to represent date times. For example, the previous -query is equivalent to the following: - +If you want to look up only the student with `ID = 5` in the `TOP_TEN_RANKS` table using a pull query: ```sql -SELECT * FROM pageviews_by_region - WHERE regionId = 'Region_1' - AND '2019-10-02T21:31:16' <= WINDOWSTART AND WINDOWEND <= '2019-10-03T21:31:16'; +SELECT * FROM TOP_TEN_RANKS + WHERE ID = 5; ``` - -You can specify time zones within the datestring. For example, -`2017-11-17T04:53:45-0330` is in the Newfoundland time zone. If no time zone is -specified within the datestring, then timestamps are interpreted in the UTC -time zone. - -If no bounds are placed on `WINDOWSTART` or `WINDOWEND`, rows are returned for all windows -in the windowed table. - -Also, you can issue a pull query against a derived table that was created by using the [CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement. - - +After enabling table scans, you can fetch the current state of your `TOP_TEN_RANKS` table using a pull query: ```sql -CREATE TABLE GRADES (ID INT PRIMARY KEY, GRADE STRING, RANK INT) - WITH (kafka_topic = 'test_topic', value_format = 'JSON', partitions = 1); +SELECT * FROM TOP_TEN_RANKS; +``` +If you want to look up the students whose ranks lie in the range `(4, 8)`: +```sql +SELECT * FROM TOP_TEN_RANKS + WHERE ID > 4 AND ID < 8; ``` -Create a derived table, named -`TOP_TEN_RANKS`, by using a [CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement: - - ```sql -CREATE TABLE TOP_TEN_RANKS - AS SELECT ID, RANK - FROM GRADES - WHERE RANK <= 10; - ``` -You can fetch the current state of your materialized view, which is -the `TOP_TEN_RANKS` derived table, by using a pull query: +Pull queries against a table `INNER_JOIN` that is created by joining multiple tables: ```sql -SELECT * FROM TOP_TEN_RANKS; +CREATE TABLE LEFT_TABLE (ID BIGINT PRIMARY KEY, NAME varchar, VALUE bigint) + WITH (kafka_topic='left_topic', value_format='JSON', partitions=4); +``` +```sql +CREATE TABLE RIGHT_TABLE (ID BIGINT PRIMARY KEY, F1 varchar, F2 bigint) + WITH (kafka_topic='right_topic', value_format='JSON', partitions=4); +``` +```sql +CREATE TABLE INNER_JOIN AS SELECT L.ID, NAME, VALUE, F1, F2 FROM LEFT_TABLE L JOIN RIGHT_TABLE R ON L.ID = R.ID; +``` +You can fetch the current state of your table `INNER_JOIN` by using a pull query: +```sql +SELECT * FROM INNER_JOIN [ WHERE where_condition ]; ``` -The following statement looks up only the student with `ID = 5` in the derived table: +Pull queries against a windowed table `NUMBER_OF_TESTS` created by aggregating a stream `STUDENTS`: ```sql -SELECT * FROM TOP_TEN_RANKS - WHERE ID = 5; +CREATE STREAM STUDENTS (ID STRING KEY, SCORE INT) + WITH (kafka_topic='students_topic', value_format='JSON', partitions=4); +``` +```sql +CREATE TABLE NUMBER_OF_TESTS AS + SELECT ID, COUNT(1) AS COUNT + FROM STUDENTS + WINDOW TUMBLING(SIZE 1 SECOND) + GROUP BY ID; +``` +Look up the number of tests taken by a student with `ID='10'`: +```sql +SELECT * + FROM NUMBER_OF_TESTS + WHERE ID='10'; +``` +Look up the number of tests taken by a student with `ID='10'` +in the window range `100 <= WindowStart AND WindowEnd <= 16000`: +```sql +SELECT * + FROM NUMBER_OF_TESTS + WHERE ID='10' AND 100 <= WindowStart AND WindowEnd <= 16000; ``` -!!! note - Currently, tables derived using Table-Table joins aren't queryable directly. To derive a queryable table, you can do: - `CREATE TABLE QUERYABLE_JOIN_TABLE AS SELECT * FROM JOIN_TABLE;` and then issue pull queries against `QUERYABLE_JOIN_TABLE`. -