You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have database changelog messages coming through kafka via connect, and create KSQL tables to use these. Since the source is (somewhat) normalised, we need to do joins to do basically anything.
The only sensible way to set up message keys in this scenario is to use the primary key from the table. Hence when joining, generally one side of the join must be re-keyed. This is quite annoying at the moment. See #2356. Even worse is if there is a type mismatch (connect is creating string keys but the primary keys are still ints, and there is now no longer any implicit conversion to string when rekeying in KSQL) because you need an entire extra stream to CAST the column, as this can no longer be done in the same step as a PARTITION BY.
Since #4278 It is now possible to perform some implicit repartitioning of joins on the stream side, which makes a lot of sense and reduces the amount of abstraction-leak in this scenario greatly. However if you try to make use of this in a table-table join, you get:
Cannot repartition a TABLE source. If this is a join, make sure that the criteria uses the TABLE key ROWKEY instead of <column/function>
Supporting an implicit repartition here would help immensely.
The main alternative here is just making repartitioning of tables less annoying to begin with, i.e. fixing #2365. However note that if there is also a CAST required it cannot currently be done in the same step as a PARTITION BY so there would still be an additional annoyance over the stream situation.
The text was updated successfully, but these errors were encountered:
We have database changelog messages coming through kafka via connect, and create KSQL tables to use these. Since the source is (somewhat) normalised, we need to do joins to do basically anything.
The only sensible way to set up message keys in this scenario is to use the primary key from the table. Hence when joining, generally one side of the join must be re-keyed. This is quite annoying at the moment. See #2356. Even worse is if there is a type mismatch (connect is creating
string
keys but the primary keys are stillint
s, and there is now no longer any implicit conversion tostring
when rekeying in KSQL) because you need an entire extra stream toCAST
the column, as this can no longer be done in the same step as aPARTITION BY
.Since #4278 It is now possible to perform some implicit repartitioning of joins on the stream side, which makes a lot of sense and reduces the amount of abstraction-leak in this scenario greatly. However if you try to make use of this in a table-table join, you get:
Cannot repartition a TABLE source. If this is a join, make sure that the criteria uses the TABLE key ROWKEY instead of <column/function>
Supporting an implicit repartition here would help immensely.
The main alternative here is just making repartitioning of tables less annoying to begin with, i.e. fixing #2365. However note that if there is also a
CAST
required it cannot currently be done in the same step as aPARTITION BY
so there would still be an additional annoyance over the stream situation.The text was updated successfully, but these errors were encountered: