Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTH_VALUE reverse support #8327

Merged
merged 4 commits into from
Nov 29, 2023
Merged

NTH_VALUE reverse support #8327

merged 4 commits into from
Nov 29, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently we cannot calculate NTH_VALUE result in reverse order. However, this can be done, if we were to support negative index for NTH_VALUE (as in python arrays). With this support we can calculate reverse representation of NTH_VALUE to better optimize some plans. Please note that negative index cannot be entered from the SQL query. In this sense, there is no difference from the user point of view.
However, query below

SELECT c, NTH_VALUE(c, 2) OVER(order by c DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as nv1
  FROM multiple_ordered_table
  ORDER BY c ASC
  LIMIT 5

where multiple_ordered_table satisfied c ASC ordering, is re-written as the following behind the scenes

SELECT c, NTH_VALUE(c, -2) OVER(order by c ASC RANGE BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING) as nv1
  FROM multiple_ordered_table
  ORDER BY c ASC
  LIMIT 5

which can be executed without introducing any SortExec to the plan.
For window frame reversal see Constructing Equivalent Window Frames for Reverse Expressions section in the blog post.

What changes are included in this PR?

This PR add negative index support to the NTH_VALUE window function to support reverse order execution.

Are these changes tested?

Yes.

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 27, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mustafasrepo -- the basic code looks good to me but the tests are a little confusing to me

One thing I wonder is why not just allow negative argument to NTH_VALUE from SQL ? If you did that I think the tests would get much simpler / easier

datafusion/core/tests/window.rs Outdated Show resolved Hide resolved
datafusion/core/tests/window.rs Outdated Show resolved Hide resolved
@@ -77,17 +79,17 @@ impl NthValue {
n: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not change this API to accept i64? It seems strange that the public interface doesn't support negative NTH valuess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgre doesn't have this feature. I thought it might be non-standard or unexpected. However we can this support in subsequent PRs.

datafusion/physical-expr/src/window/nth_value.rs Outdated Show resolved Hide resolved
@@ -3493,6 +3493,48 @@ select sum(1) over() x, sum(1) over () y
----
1 1

query TT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please leave a comment explaining what this is a test for (specifically that you are expecting that the nth value should be negative)?

Also I can't see a negative nth value in the plan so maybe I am missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure added a comment. However, since window expression names are string, without string manipulation it is not easy to reflect window reversal in the plan. However, I think we should do this absolutely. I will fix this problem in subsequent PRs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @mustafasrepo

@mustafasrepo mustafasrepo merged commit e21b031 into apache:main Nov 29, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants