Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project STRUCT for use in PARTITION BY clause #3876

Open
robinroos opened this issue Nov 16, 2019 · 3 comments
Open

Project STRUCT for use in PARTITION BY clause #3876

robinroos opened this issue Nov 16, 2019 · 3 comments

Comments

@robinroos
Copy link

I need to re-key data through a KSQL persistent query.

The PARTITION BY clause takes only a single field, but I need to partition on a 3-field structure {Book, Pair, SettleDate}.

As an inelegant quick win, I added precisely that STRUCT into the original record, named PositionKey, and I can PARTITION BY PartitionKey.

My question is:

Is it possible to project a STRUCT from some fields of the query, so as to have a single STRUCT field by which to partition?

e.g.,

create stream, near_position_change as select STRUCT<Book, Pair, SettleDate> as NewKey, BaseAmount, QuotedAmount, ... FROM trades PARTITION BY NewKey;

Thanks, Robin.

@rmoff
Copy link
Contributor

rmoff commented Dec 10, 2019

Sounds like #2147 ?

@robinroos
Copy link
Author

This (#3876) is certainly dependent upon #2147.

My specific hope is that the STRUCT so projected:

  1. can be used in a PARTITION BY clause, and
  2. the resulting message key is entirely equivalent to an (anonymous) Avro record of the same types in the same field order

In this regard, ... SELECT ... STRUCT<a,b,c> AS NewKey ... PARTITION BY NewKey would cause messages to be partitioned in exactly the same manner as if the messages had been published to the topic by a producer with an explicit Avro IDL-generated instance as the message key.

My presumption is that the name of the Avro record, and the names of the fields in the Avro record, are of no consequence when an instance of such a record is used as a message key, only the ordered types and values being of significance. If that presumption is not correct then it may be necessary to provide AS clauses for each of a,b,c within the STRUCT<> in order to align key field names (perhaps already covered in #2147).

Naturally the same should apply to keys used for partitioning aggregation results c/o GROUP BY, and I note that AS clauses are likely not appropriate there. It seems that currently GROUP BY gives rise to a message key whose internal structure is unclear, but whose string representation uses |+| as a delimiter between field values. Let me know if I should raise a separate issue specifically for the GROUP BY case, which involves implicit projection.

Whilst I would choose always to use Avro for keys (pending #3986, #3533), I appreciate that not all Confluent/Kafka users necessarily share that view.

@pbettler-CBeds
Copy link

+1 on partitioning and initializing a new struct in the same line. Would be very helpful enhancement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants