Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intuitive way to define Table KEY #804

Closed
blueedgenick opened this issue Feb 27, 2018 · 6 comments
Closed

More intuitive way to define Table KEY #804

blueedgenick opened this issue Feb 27, 2018 · 6 comments

Comments

@blueedgenick
Copy link
Contributor

blueedgenick commented Feb 27, 2018

Currently the specification of which field from the message-value in a Table contains the same value as the message-key is rather non-intuitive and confuses many users. Additionally, it is the case that it may be impossible to specify this metadata in a way which KSQL will accept, given recent tightening of the checks around presence of the KEY='foo' stanza - if, for example, there is no column in the message values which happens to duplicate the contents of the message key!

There are 2 components to this request, which may make sense to consider together as an improvement to this situation:

  1. replace the current way of specifying the KEY, inside of the WITH clause, with a mechanism which feels more natural to a SQL-savvy user. Suggest something like this CREATE TABLE foo { keycol int KEY, othercol varchar}.
  2. instead of requiring that the key be also present in the value, allow the syntax for (1) above to instead mean "keycol is the alternative name by which i will refer to the ROWKEY. If there is a field inside of my message-value with this name (e.g. it's a JSON message with a 'keycol' entry) then the engine can read the value from either there or from the actual message key, whichever is most convenient/optimal".
@bluemonk3y
Copy link

Im +1 for this - it follows expected SQL semantics - https://www.w3schools.com/sql/sql_primarykey.asp

@big-andy-coates
Copy link
Contributor

It feels to me like there are two (potential) uses for the WITH KEY clause:

  1. To provide a more human friendly name for the key of the Kafka record, i.e. to provide an alias for ROWKEY. (i.e. as suggested above).
  2. To inform KSQL that the Kafka record's key is also available in the value, so that KSQL can avoid unnecessary repartitioning should the user submit statements that repartition by the field in the value that matches the key. (i.e. the current behaviour).

I think the second use, (i.e. current behaviour), has a lot of potential to introduce errors into user's applications and is less intuitive. We wouldn't want to add the overhead of checking that both copies of the key in the Kafka record matched, but if there were to differ, then results would be 'undefined'.

I'm a +1 on switching WITH (KEY = 'foo') to just mean its an alias of ROWKEY. In the situation where there is a field within the value that matches the key, then the use of WITH (KEY = 'foo') should read the key from the Kafka record into the 'foo' column, i.e. it should ignore the 'foo' field in the value. Semantically, as the two values should match, this should make no difference to the running application, but the scope for error is much reduced. This would also remove the need for the 'with key' clause in CT statements.

@big-andy-coates
Copy link
Contributor

@hjafarpour can you shed light on why the current implementation is the way it is?

@ybyzek
Copy link
Contributor

ybyzek commented Jul 26, 2018

The current 3-step process to rekey a topic for a table prevents people from unintentionally repartitioning because they don’t understand the implications. For usability, it makes sense to simplify this into 1-step and caveat repartitioning.

@apurvam
Copy link
Contributor

apurvam commented Aug 3, 2018

Another thing to think about here is what if you do a CREATE TABLE foobar AS SELECT col1, col2 FROM some_table;, and you don't select the key column? What is the key for the table? As we see in #1686, we won't set the key field for the foobar table above.

Any fixing of the key field specification for the table should also naturally translate to the keys for tables created by CTAS statements.

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Apr 29, 2019

See #2745, which drops the requirement for having a KEY defined on a table and makes it just an optimisation.

The second part falls into work planned for structured keys.

So... I'm going to close this. Feel free to reopen if you think it should still be open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants