chore: support window bound columns in selection of windowed group by #4450
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes bits of #4397
KSQL currently lets you take a non-windowed stream and perform a windowed group by:
Which is essentially grouping by not just
something
, but also implicitly by the window bounds.This might be more correctly written with a Tumbling table function:
Where the Tumbling table function returns one row for each row in
S
, with the addition of thewindowstart
andwindowend
columns. (Note: Hopping and session table functions are also possible, though in the case of the latter the table function would also emit retractions).In a correct SQL model
windowstart
andwindowend
would therefore be available as fields within the selection, e.g.This change makes such awesomeness possible.
How to review.
The changes, TBH, feel a little hacky. They're too spread around the code base. However, without some restructuring there's little choice but to hack the window bounds in. The planned structured keys work will improve things, and we probably need to rethink windowing at some point. That said, reviewing notes are:
QueryAnalyser
has been enhanced to also pass WHERE expressions to the aggregate analyser, so that it can throw on any window bounds columns.AggregateAnalyzer
has been enhanced to throw if the window bound columns are used anywhere but in the select of windowed group by statements. (We'll need a Streams change to support anything else).Analysis.getFromSourceSchemas
now takes a boolean flag to indicate if the returned schemas should, for windowed group bys, include the window bounds columns. Basically, the window bound columns are only available post the aggregation. A lot of the code passestrue
to get the schema with the window bounds in it, and relying on the fact that theAggregateAnalyzer
has already rejected anything invalid.AggregateNode
andLogicalPlanner
have some magic to handle the special window bounds columns, where appropriate.AggregateParamsFactory
now takes a flag to indicate a windowed group by, and adds appropriate window bounds columns to the result schema.StreamAggregateBuilder
adds a newtransformValues
step to populate the window bounds values from the windowed key.Note: there are some expected topology changes, but these will mostly be reverted by a follow on PR and are also backwards compatible. The topologies also include a dubiously named
Aggregate-Aggregate-WindowSelect2
node. This will also go in the follow up PR.Testing done
Suitable tests added for pull, push and persistent queries.
Reviewer checklist