Project STRUCT for use in PARTITION BY clause #3876

robinroos · 2019-11-16T15:43:35Z

I need to re-key data through a KSQL persistent query.

The PARTITION BY clause takes only a single field, but I need to partition on a 3-field structure {Book, Pair, SettleDate}.

As an inelegant quick win, I added precisely that STRUCT into the original record, named PositionKey, and I can PARTITION BY PartitionKey.

My question is:

Is it possible to project a STRUCT from some fields of the query, so as to have a single STRUCT field by which to partition?

e.g.,

create stream, near_position_change as select STRUCT<Book, Pair, SettleDate> as NewKey, BaseAmount, QuotedAmount, ... FROM trades PARTITION BY NewKey;

Thanks, Robin.

The text was updated successfully, but these errors were encountered:

rmoff · 2019-12-10T15:56:20Z

Sounds like #2147 ?

robinroos · 2019-12-11T22:36:02Z

This (#3876) is certainly dependent upon #2147.

My specific hope is that the STRUCT so projected:

can be used in a PARTITION BY clause, and
the resulting message key is entirely equivalent to an (anonymous) Avro record of the same types in the same field order

In this regard, ... SELECT ... STRUCT<a,b,c> AS NewKey ... PARTITION BY NewKey would cause messages to be partitioned in exactly the same manner as if the messages had been published to the topic by a producer with an explicit Avro IDL-generated instance as the message key.

My presumption is that the name of the Avro record, and the names of the fields in the Avro record, are of no consequence when an instance of such a record is used as a message key, only the ordered types and values being of significance. If that presumption is not correct then it may be necessary to provide AS clauses for each of a,b,c within the STRUCT<> in order to align key field names (perhaps already covered in #2147).

Naturally the same should apply to keys used for partitioning aggregation results c/o GROUP BY, and I note that AS clauses are likely not appropriate there. It seems that currently GROUP BY gives rise to a message key whose internal structure is unclear, but whose string representation uses |+| as a delimiter between field values. Let me know if I should raise a separate issue specifically for the GROUP BY case, which involves implicit projection.

Whilst I would choose always to use Avro for keys (pending #3986, #3533), I appreciate that not all Confluent/Kafka users necessarily share that view.

pbettler-CBeds · 2022-03-31T14:58:57Z

+1 on partitioning and initializing a new struct in the same line. Would be very helpful enhancement

robinroos added the question label Nov 16, 2019

agavra added the enhancement label Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project STRUCT for use in PARTITION BY clause #3876

Project STRUCT for use in PARTITION BY clause #3876

robinroos commented Nov 16, 2019

rmoff commented Dec 10, 2019

robinroos commented Dec 11, 2019

pbettler-CBeds commented Mar 31, 2022

Project STRUCT for use in PARTITION BY clause #3876

Project STRUCT for use in PARTITION BY clause #3876

Comments

robinroos commented Nov 16, 2019

rmoff commented Dec 10, 2019

robinroos commented Dec 11, 2019

pbettler-CBeds commented Mar 31, 2022