Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSQL should allow the user to specify an expression for the stream side of the JOIN predicate #596

Closed
rmoff opened this issue Jan 5, 2018 · 1 comment · Fixed by #4440

Comments

@rmoff
Copy link
Contributor

rmoff commented Jan 5, 2018

Trying to join between these two objects:

ksql> DESCRIBE LOGON;

 Field       | Type
-----------------------------------------
 ROWTIME     | BIGINT           (system)
 ROWKEY      | VARCHAR(STRING)  (system)
 OP_TYPE     | VARCHAR(STRING)
 OP_TS       | VARCHAR(STRING)
 CURRENT_TS  | VARCHAR(STRING)
 POS         | VARCHAR(STRING)
 LOGON_ID    | DOUBLE
 CUSTOMER_ID | DOUBLE
 LOGON_DATE  | VARCHAR(STRING)
-----------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
ksql> DESCRIBE CUSTOMERS;

 Field             | Type
-----------------------------------------------
 ROWTIME           | BIGINT           (system)
 ROWKEY            | VARCHAR(STRING)  (system)
 OP_TYPE           | VARCHAR(STRING)
 OP_TS             | VARCHAR(STRING)
 CURRENT_TS        | VARCHAR(STRING)
 POS               | VARCHAR(STRING)
 CUSTOMER_ID       | BIGINT
[...]

CUSTOMER_ID is BIGINT on one, DOUBLE on the other. This difference in data types prevents the join from working:

ksql> SELECT L.LOGON_ID,C.CUST_FIRST_NAME FROM LOGON L LEFT OUTER JOIN CUSTOMERS C ON L.CUSTOMER_ID = C.CUSTOMER_ID WHERE L.CUSTOMER_ID =75001 LIMIT 2;
226236.0 | null
47514.0 | null
LIMIT reached for the partition.
Query terminated
ksql>

This join should return rows. The CUSTOMER table has a row for the id:

ksql> SELECT CUST_FIRST_NAME FROM CUSTOMERS WHERE CUSTOMER_ID =75001 LIMIT 2;
albert

and the LOGON stream has several rows for the same id:

ksql> SELECT LOGON_ID FROM LOGON WHERE CUSTOMER_ID =75001 LIMIT 2;
226236.0
47514.0
LIMIT reached for the partition.
Query terminated
ksql>

If I try to CAST the column on the fly, it fails :

ksql> SELECT L.LOGON_ID,C.CUST_FIRST_NAME FROM LOGON L LEFT OUTER JOIN CUSTOMERS C ON CAST(L.CUSTOMER_ID AS BIGINT) = C.CUSTOMER_ID WHERE L.CUSTOMER_ID =75001 LIMIT 2;
 Line: 1, Col: 111 : Invalid join criteria (CAST(L.CUSTOMER_ID AS BIGINT) = C.CUSTOMER_ID). Key for L is not set correctly.
ksql>

The only way around this is to create a derived stream in which the datatype is corrected:

ksql> CREATE STREAM FOO AS SELECT CAST(LOGON_ID AS BIGINT) AS LOGON_ID, CAST(CUSTOMER_ID AS BIGINT) AS CUSTOMER_ID, LOGON_DATE FROM LOGON;

 Message
----------------------------
 Stream created and running
----------------------------

and then use this in the join:

ksql> SELECT L.LOGON_ID,C.CUST_FIRST_NAME FROM FOO L LEFT OUTER JOIN CUSTOMERS C ON L.CUSTOMER_ID = C.CUSTOMER_ID WHERE L.CUSTOMER_ID =75001 LIMIT 2;
226236 | albert
166662 | albert
LIMIT reached for the partition.
Query terminated
ksql>
  1. KSQL should be able to join on a DOUBLE and BIGINT
  2. CAST should be valid in a JOIN clause
@rodesai
Copy link
Contributor

rodesai commented Oct 8, 2018

I think this can be generalized to supporting any expression on the stream column in the join predicate.

@rodesai rodesai changed the title KSQL does not allow CAST in a JOIN clause KSQL should allow the user to specify an expression for the stream side of the JOIN predicate Oct 8, 2018
big-andy-coates added a commit to big-andy-coates/ksql that referenced this issue Feb 4, 2020
fixes: confluentinc#596

Functionality to join on an expression was added previously.
big-andy-coates added a commit that referenced this issue Feb 7, 2020
* test: add test to prove #596 is fixed

fixes: #596

Functionality to join on an expression was added previously.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants