Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support implicit repartitioning of table sources during joins #4666

Open
fish-face opened this issue Feb 28, 2020 · 0 comments
Open

Support implicit repartitioning of table sources during joins #4666

fish-face opened this issue Feb 28, 2020 · 0 comments

Comments

@fish-face
Copy link

We have database changelog messages coming through kafka via connect, and create KSQL tables to use these. Since the source is (somewhat) normalised, we need to do joins to do basically anything.

The only sensible way to set up message keys in this scenario is to use the primary key from the table. Hence when joining, generally one side of the join must be re-keyed. This is quite annoying at the moment. See #2356. Even worse is if there is a type mismatch (connect is creating string keys but the primary keys are still ints, and there is now no longer any implicit conversion to string when rekeying in KSQL) because you need an entire extra stream to CAST the column, as this can no longer be done in the same step as a PARTITION BY.

Since #4278 It is now possible to perform some implicit repartitioning of joins on the stream side, which makes a lot of sense and reduces the amount of abstraction-leak in this scenario greatly. However if you try to make use of this in a table-table join, you get:

Cannot repartition a TABLE source. If this is a join, make sure that the criteria uses the TABLE key ROWKEY instead of <column/function>

Supporting an implicit repartition here would help immensely.

The main alternative here is just making repartitioning of tables less annoying to begin with, i.e. fixing #2365. However note that if there is also a CAST required it cannot currently be done in the same step as a PARTITION BY so there would still be an additional annoyance over the stream situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant