More intuitive way to define Table KEY #804

blueedgenick · 2018-02-27T09:09:13Z

Currently the specification of which field from the message-value in a Table contains the same value as the message-key is rather non-intuitive and confuses many users. Additionally, it is the case that it may be impossible to specify this metadata in a way which KSQL will accept, given recent tightening of the checks around presence of the KEY='foo' stanza - if, for example, there is no column in the message values which happens to duplicate the contents of the message key!

There are 2 components to this request, which may make sense to consider together as an improvement to this situation:

replace the current way of specifying the KEY, inside of the WITH clause, with a mechanism which feels more natural to a SQL-savvy user. Suggest something like this CREATE TABLE foo { keycol int KEY, othercol varchar}.
instead of requiring that the key be also present in the value, allow the syntax for (1) above to instead mean "keycol is the alternative name by which i will refer to the ROWKEY. If there is a field inside of my message-value with this name (e.g. it's a JSON message with a 'keycol' entry) then the engine can read the value from either there or from the actual message key, whichever is most convenient/optimal".

The text was updated successfully, but these errors were encountered:

bluemonk3y · 2018-02-27T10:35:30Z

Im +1 for this - it follows expected SQL semantics - https://www.w3schools.com/sql/sql_primarykey.asp

big-andy-coates · 2018-04-11T09:47:09Z

It feels to me like there are two (potential) uses for the WITH KEY clause:

To provide a more human friendly name for the key of the Kafka record, i.e. to provide an alias for ROWKEY. (i.e. as suggested above).
To inform KSQL that the Kafka record's key is also available in the value, so that KSQL can avoid unnecessary repartitioning should the user submit statements that repartition by the field in the value that matches the key. (i.e. the current behaviour).

I think the second use, (i.e. current behaviour), has a lot of potential to introduce errors into user's applications and is less intuitive. We wouldn't want to add the overhead of checking that both copies of the key in the Kafka record matched, but if there were to differ, then results would be 'undefined'.

I'm a +1 on switching WITH (KEY = 'foo') to just mean its an alias of ROWKEY. In the situation where there is a field within the value that matches the key, then the use of WITH (KEY = 'foo') should read the key from the Kafka record into the 'foo' column, i.e. it should ignore the 'foo' field in the value. Semantically, as the two values should match, this should make no difference to the running application, but the scope for error is much reduced. This would also remove the need for the 'with key' clause in CT statements.

big-andy-coates · 2018-04-30T16:00:38Z

@hjafarpour can you shed light on why the current implementation is the way it is?

ybyzek · 2018-07-26T18:17:45Z

The current 3-step process to rekey a topic for a table prevents people from unintentionally repartitioning because they don’t understand the implications. For usability, it makes sense to simplify this into 1-step and caveat repartitioning.

apurvam · 2018-08-03T17:52:42Z

Another thing to think about here is what if you do a CREATE TABLE foobar AS SELECT col1, col2 FROM some_table;, and you don't select the key column? What is the key for the table? As we see in #1686, we won't set the key field for the foobar table above.

Any fixing of the key field specification for the table should also naturally translate to the keys for tables created by CTAS statements.

big-andy-coates · 2019-04-29T13:20:00Z

See #2745, which drops the requirement for having a KEY defined on a table and makes it just an optimisation.

The second part falls into work planned for structured keys.

So... I'm going to close this. Feel free to reopen if you think it should still be open.

apurvam added the enhancement label Feb 27, 2018

This was referenced Feb 27, 2018

KSQL joins will fail silently if a table's topic message key does not match the declared KSQL Table key #749

Open

Add support for generic message keys (e.g. message keys in Avro or JSON format, STRUCT) #824

Closed

rodesai added the record-keys label Mar 8, 2018

miguno mentioned this issue Mar 29, 2018

Docs: improve KEY and TIMESTAMP descriptions for CREATE STREAM and CREATE TABLE #1094

Merged

big-andy-coates mentioned this issue Apr 11, 2018

Validation of join predicate should verify that the table operand is the key #965

Open

apurvam mentioned this issue Aug 3, 2018

NPE when joining table to result of a table table join #1686

Closed

rmoff mentioned this issue Jan 3, 2019

Support repartitioning of topic data for stream-table joins more elegantly #2314

Open

big-andy-coates closed this as completed Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More intuitive way to define Table KEY #804

More intuitive way to define Table KEY #804

blueedgenick commented Feb 27, 2018 •

edited by miguno

Loading

bluemonk3y commented Feb 27, 2018

big-andy-coates commented Apr 11, 2018

big-andy-coates commented Apr 30, 2018

ybyzek commented Jul 26, 2018

apurvam commented Aug 3, 2018

big-andy-coates commented Apr 29, 2019 •

edited

Loading

More intuitive way to define Table KEY #804

More intuitive way to define Table KEY #804

Comments

blueedgenick commented Feb 27, 2018 • edited by miguno Loading

bluemonk3y commented Feb 27, 2018

big-andy-coates commented Apr 11, 2018

big-andy-coates commented Apr 30, 2018

ybyzek commented Jul 26, 2018

apurvam commented Aug 3, 2018

big-andy-coates commented Apr 29, 2019 • edited Loading

blueedgenick commented Feb 27, 2018 •

edited by miguno

Loading

big-andy-coates commented Apr 29, 2019 •

edited

Loading