Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batching for RANGE running window aggregations. Including on [databricks] #9544

Merged
merged 23 commits into from
Nov 2, 2023

Conversation

mythrocks
Copy link
Collaborator

This is a followup to #9489, but applies to more than FIRST().

GpuRunningWindowExec handles "running-window" aggregations, i.e. [UNBOUNDED PRECEDING, CURRENT ROW, as long as they are ROW based window specifications.

But when the window spec isn't specified, the default is to use RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW:

SELECT FIRST(foo) OVER (PARTITION BY bar ORDER BY goo) ...
-- Defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

This causes the window aggregation to be done with a single batch, with an increased likelihood of OOMs.

This commit adds batching support for RANGE queries by including the order-by column in batching. This allows the rows with the same values of partition-keys and order-by-keys to be processed as part of the same batch. This allows us to avoid crossing batches for fix-ups.

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@mythrocks
Copy link
Collaborator Author

Build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it looks really good. Just want to see if there is something we can do with the testing for range windows. Probably is fine as is, but I would love it if we could get some coverage.

integration_tests/src/main/python/window_function_test.py Outdated Show resolved Hide resolved
@mythrocks mythrocks self-assigned this Oct 31, 2023
This reverts commit 22aaf7c.

Best not combine the running window ROW/RANGE tests.
@mythrocks mythrocks added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed feature request New feature or request labels Nov 1, 2023
@revans2
Copy link
Collaborator

revans2 commented Nov 1, 2023

build

revans2
revans2 previously approved these changes Nov 1, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mythrocks
Copy link
Collaborator Author

mythrocks commented Nov 1, 2023

Thanks for the review, @revans2. I'm adding a couple of similar new tests for the partitioned cases.
Apologies in advance: The new commit will dismiss the current review.

@mythrocks mythrocks changed the title [WIP] Support batching for RANGE running window aggregations Support batching for RANGE running window aggregations Nov 1, 2023
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks changed the title Support batching for RANGE running window aggregations Support batching for RANGE running window aggregations. Including on [databricks] Nov 2, 2023
@mythrocks
Copy link
Collaborator Author

Build

@revans2 revans2 merged commit 0811cb4 into NVIDIA:branch-23.12 Nov 2, 2023
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants