Skip to content

Commit

Permalink
docs: rewrite pull query docs page (#7603)
Browse files Browse the repository at this point in the history
* chore: fix pull queries in https://docs/concepts/queries.md (#7597)

* chore: fix pull queries in docs/concepts/queries.md

* nit in language

Co-authored-by: Chittaranjan Prasad <>

* docs: rewrite pull query docs page (#7532)

* chore: rewrite pull query docs page

* revise some more

* incorporate Alan's comment

* explicitly call out features and limitations

* Update docs/developer-guide/ksqldb-reference/select-pull-query.md

Co-authored-by: Almog Gavra <[email protected]>

* Update docs/developer-guide/ksqldb-reference/select-pull-query.md

Co-authored-by: Almog Gavra <[email protected]>

* Update docs/developer-guide/ksqldb-reference/select-pull-query.md

Co-authored-by: Almog Gavra <[email protected]>

* Update docs/developer-guide/ksqldb-reference/select-pull-query.md

Co-authored-by: Almog Gavra <[email protected]>

* Update docs/developer-guide/ksqldb-reference/select-pull-query.md

Co-authored-by: Almog Gavra <[email protected]>

* nits

Co-authored-by: Chittaranjan Prasad <>
Co-authored-by: Almog Gavra <[email protected]>

Co-authored-by: Almog Gavra <[email protected]>
  • Loading branch information
cprasad1 and agavra authored May 27, 2021
1 parent 6bd35f4 commit cef2246
Showing 1 changed file with 97 additions and 70 deletions.
167 changes: 97 additions & 70 deletions docs/developer-guide/ksqldb-reference/select-pull-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,98 +14,125 @@ Synopsis

```sql
SELECT select_expr [, ...]
FROM aggregate_table
WHERE key_column=key [AND ...]
[AND window_bounds];
FROM table
[ WHERE where_condition ]
[ AND window_bounds ];
```

Description
-----------

Pulls the current value from the materialized table and terminates. The result
of this statement isn't persisted in a {{ site.ak }} topic and is printed out
only in the console.

Pull queries enable you to fetch the current state of a materialized view.
Because materialized views are incrementally updated as new events arrive,
pull queries run with predictably low latency. They're a great match for
request/response flows. For asynchronous application flows, see
Pulls the current value from the materialized view and terminates. The result
of this statement is not persisted in a {{ site.ak }} topic and is printed out
only in the console. Pull queries run with predictably low latency because
materialized views are incrementally updated as new events arrive.
They are a great match for request/response flows. For asynchronous application flows, see
[Push Queries](select-push-query.md).

Execute a pull query by sending an HTTP request to the ksqlDB REST API, and
You can execute a pull query by sending an HTTP request to the ksqlDB REST API, and
the API responds with a single response.

The WHERE clause must contain a value for each primary-key column to retrieve and may
optionally include bounds on `WINDOWSTART` and `WINDOWEND` if the materialized table is windowed.
For more information, see
[Time and Windows in ksqlDB](../../concepts/time-and-windows-in-ksqldb-queries.md).
- Pull queries are expressed using a strict subset of ANSI SQL.
- You can issue a pull query against any table that was created by a
[CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement.
- Currently, we do not support pull queries against tables created by using a [CREATE TABLE](../../ksqldb-reference/create-table) statement.
- Pull queries do not support `JOIN`, `PARTITION BY`, `GROUP BY` and `WINDOW` clauses (but can query materialized tables that contain those clauses)

WHERE Clause Guidelines
-----------------------

By default, only key lookups are enabled. They have the following requirements:
- Key column(s) must use an equality comparison to a literal (e.g. KEY = 'abc').
- On windowed tables, WINDOWSTART and WINDOWEND can be optionally compared to literals.
For more information on windowed tables, see [Time and Windows in ksqlDB](../../concepts/time-and-windows-in-ksqldb-queries.md).

You can loosen the restrictions on the `WHERE` clause, or eliminate the `WHERE` clause altogether,
by enabling table scans in your current CLI session with the command `SET 'ksql.query.pull.table.scan.enabled'='true';`.
Table scans can also be enabled by default by setting a server configuration property with
`ksql.query.pull.table.scan.enabled=true`. Once table scans are enabled, the following additional expressions are allowed:
- Key column(s) using range comparisons to literals.
- Non-key columns to be used alone, without key references.
- Columns to be compared to other columns.
- References to subsets of columns from a multi-column key.
- Complex expressions without direct column references (e.g. instr(NAME_COL, 'hello') > 0).

Example
-------
!!! note
Table scan based queries are just the next incremental step for ksqlDB pull queries.
In future releases, we will continue pushing the envelope of new query capabilities and
greater performance and efficiency.

Examples
--------
Pull queries against a table `TOP_TEN_RANKS` created by using a
[CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement:
First, create a table named `GRADES` by using a [CREATE TABLE](../../ksqldb-reference/create-table)
statement:
```sql
SELECT * FROM pageviews_by_region
WHERE regionId = 'Region_1'
AND 1570051876000 <= WINDOWSTART AND WINDOWEND <= 1570138276000;
CREATE TABLE GRADES (ID INT PRIMARY KEY, GRADE STRING, RANK INT)
WITH (kafka_topic = 'test_topic', value_format = 'JSON', partitions = 4);
```

If the `pageviews_by_region` table was created as an aggregation of multiple columns,
then each key column must be present in the WHERE clause. The following example shows how to
query the table if `countryId` and `regionId` where both key columns:

Then, create a derived table named `TOP_TEN_RANKS` by using a
[CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement:
```sql
SELECT * FROM pageviews_by_region
WHERE countryId = 'USA' AND regionId = 'Region_1'
AND 1570051876000 <= WINDOWSTART AND WINDOWEND <= 1570138276000;
CREATE TABLE TOP_TEN_RANKS
AS SELECT ID, RANK
FROM GRADES
WHERE RANK <= 10;
```

When writing logical expressions using `WINDOWSTART` or `WINDOWEND`, you can use ISO-8601
formatted datestrings to represent date times. For example, the previous
query is equivalent to the following:

If you want to look up only the student with `ID = 5` in the `TOP_TEN_RANKS` table using a pull query:
```sql
SELECT * FROM pageviews_by_region
WHERE regionId = 'Region_1'
AND '2019-10-02T21:31:16' <= WINDOWSTART AND WINDOWEND <= '2019-10-03T21:31:16';
SELECT * FROM TOP_TEN_RANKS
WHERE ID = 5;
```

You can specify time zones within the datestring. For example,
`2017-11-17T04:53:45-0330` is in the Newfoundland time zone. If no time zone is
specified within the datestring, then timestamps are interpreted in the UTC
time zone.

If no bounds are placed on `WINDOWSTART` or `WINDOWEND`, rows are returned for all windows
in the windowed table.

Also, you can issue a pull query against a derived table that was created by using the [CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement.


After enabling table scans, you can fetch the current state of your `TOP_TEN_RANKS` table using a pull query:
```sql
CREATE TABLE GRADES (ID INT PRIMARY KEY, GRADE STRING, RANK INT)
WITH (kafka_topic = 'test_topic', value_format = 'JSON', partitions = 1);
SELECT * FROM TOP_TEN_RANKS;
```
If you want to look up the students whose ranks lie in the range `(4, 8)`:
```sql
SELECT * FROM TOP_TEN_RANKS
WHERE ID > 4 AND ID < 8;
```
Create a derived table, named
`TOP_TEN_RANKS`, by using a [CREATE TABLE AS SELECT](../../ksqldb-reference/create-table-as-select) statement:

```sql
CREATE TABLE TOP_TEN_RANKS
AS SELECT ID, RANK
FROM GRADES
WHERE RANK <= 10;
```
You can fetch the current state of your materialized view, which is
the `TOP_TEN_RANKS` derived table, by using a pull query:

Pull queries against a table `INNER_JOIN` that is created by joining multiple tables:
```sql
SELECT * FROM TOP_TEN_RANKS;
CREATE TABLE LEFT_TABLE (ID BIGINT PRIMARY KEY, NAME varchar, VALUE bigint)
WITH (kafka_topic='left_topic', value_format='JSON', partitions=4);
```
```sql
CREATE TABLE RIGHT_TABLE (ID BIGINT PRIMARY KEY, F1 varchar, F2 bigint)
WITH (kafka_topic='right_topic', value_format='JSON', partitions=4);
```
```sql
CREATE TABLE INNER_JOIN AS SELECT L.ID, NAME, VALUE, F1, F2 FROM LEFT_TABLE L JOIN RIGHT_TABLE R ON L.ID = R.ID;
```
You can fetch the current state of your table `INNER_JOIN` by using a pull query:
```sql
SELECT * FROM INNER_JOIN [ WHERE where_condition ];
```
The following statement looks up only the student with `ID = 5` in the derived table:

Pull queries against a windowed table `NUMBER_OF_TESTS` created by aggregating a stream `STUDENTS`:
```sql
SELECT * FROM TOP_TEN_RANKS
WHERE ID = 5;
CREATE STREAM STUDENTS (ID STRING KEY, SCORE INT)
WITH (kafka_topic='students_topic', value_format='JSON', partitions=4);
```
```sql
CREATE TABLE NUMBER_OF_TESTS AS
SELECT ID, COUNT(1) AS COUNT
FROM STUDENTS
WINDOW TUMBLING(SIZE 1 SECOND)
GROUP BY ID;
```
Look up the number of tests taken by a student with `ID='10'`:
```sql
SELECT *
FROM NUMBER_OF_TESTS
WHERE ID='10';
```
Look up the number of tests taken by a student with `ID='10'`
in the window range `100 <= WindowStart AND WindowEnd <= 16000`:
```sql
SELECT *
FROM NUMBER_OF_TESTS
WHERE ID='10' AND 100 <= WindowStart AND WindowEnd <= 16000;
```
!!! note
Currently, tables derived using Table-Table joins aren't queryable directly. To derive a queryable table, you can do:
`CREATE TABLE QUERYABLE_JOIN_TABLE AS SELECT * FROM JOIN_TABLE;` and then issue pull queries against `QUERYABLE_JOIN_TABLE`.

0 comments on commit cef2246

Please sign in to comment.