Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support negative preceding/following for ROW-based window functions #9229

Merged
merged 17 commits into from
Sep 26, 2023

Conversation

mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Sep 12, 2023

Fixes #5314.
Depends on rapidsai/cudf#14093.

This commit adds support for negative values for preceding/following offsets specified for ROW based window functions.

Prior to this commit, window function queries such as the following were not supported:

SELECT MIN(x) OVER (PARTITION BY grp 
                    ORDER BY oby 
                    ROWS BETWEEN 5 PRECEDING AND -1 FOLLOWING) min_x 
FROM mytable

For this query, the window includes all rows between upto 5 rows preceding the current row, and the previous row.

This functionality is currently supported only for:

  1. AVG
  2. COUNT(1)/COUNT(*)
  3. MAX
  4. MIN
  5. SUM
  6. COLLECT_LIST
  7. COLLECT_SET

Explicit loops.
TODO: Replace comprehensions with maps.
1. AVG is slightly off at high precision. @approx_decimal would be nice to have.
2. COLLECT_SET results don't match. Needs investigation.
@mythrocks mythrocks marked this pull request as draft September 12, 2023 22:29
@mythrocks
Copy link
Collaborator Author

Keeping this in draft, until rapidsai/cudf#14093 is merged, and available in spark-rapids-jni.

@mythrocks mythrocks self-assigned this Sep 12, 2023
@mythrocks mythrocks added feature request New feature or request cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Sep 12, 2023
revans2
revans2 previously approved these changes Sep 13, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits

revans2
revans2 previously approved these changes Sep 18, 2023
integration_tests/src/main/python/window_function_test.py Outdated Show resolved Hide resolved
revans2
revans2 previously approved these changes Sep 18, 2023
@mythrocks
Copy link
Collaborator Author

rapidsai/cudf#14093 was just merged. It shouldn't be long before this change can be put through CI.

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks marked this pull request as ready for review September 25, 2023 17:25
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

I've fixed an error in the lower-bounds checks: 55a3ec9

Spark bounds are positive and negative, depending on where it lies relative to the current row. This is different from CUDF, where preceding/following are positive; the direction is interpreted based on whether it's preceding or following.

@mythrocks
Copy link
Collaborator Author

Another fix: I'm skipping test_window_aggs_for_negative_rows_unpartitioned on Spark versions that do not have the fix posted in apache/spark#37540.

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit e976724 into NVIDIA:branch-23.10 Sep 26, 2023
29 checks passed
@mythrocks
Copy link
Collaborator Author

Thank you for the review, @revans2. I have merged this change.

@mythrocks mythrocks deleted the offset-row-windows-wip branch November 2, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support window.rowsBetween(Window.unboundedPreceding, -1)
2 participants