Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Change display of Window and Aggregate Functions #8353

Closed
wants to merge 12 commits into from
Closed

MINOR: Change display of Window and Aggregate Functions #8353

wants to merge 12 commits into from

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Nov 29, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

As discussed in link. Currently, it is not obvious which aggregate or window functions are reversed. The reason is that, display of window expressions are static (comes from logical expression).

This PR changes display implementation of window expression so that they do not use static logical plan window expressions. With this new PR, display String is constructed using (function name, partition by (if available), order by expressions (if available)).

What changes are included in this PR?

Displays of WindowAggExec, BoundedWindowAggExec, AggregateExec are changed in the tests.

Are these changes tested?

Existing tests should work.

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 29, 2023
@mustafasrepo mustafasrepo marked this pull request as draft November 29, 2023 08:42
# Conflicts:
#	datafusion/physical-expr/src/window/nth_value.rs
#	datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs
#	datafusion/sqllogictest/test_files/window.slt
@mustafasrepo mustafasrepo marked this pull request as ready for review November 29, 2023 10:35
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo -- the output of this PR looks much nicer, but I am confused about the seemingly parallel API.

@@ -906,17 +906,17 @@ mod tests {

let physical_plan = bounded_window_exec("non_nullable_col", sort_exprs, filter);

let expected_input = ["BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow }], mode=[Sorted]",
let expected_input = ["BoundedWindowAggExec: wdw=[COUNT(non_nullable_col@1) ORDER BY [non_nullable_col@1 ASC NULLS LAST], frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow }], mode=[Sorted]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is certainly much nicer 👍

datafusion/physical-expr/src/aggregate/mod.rs Outdated Show resolved Hide resolved

/// Human readable name such as `"MIN(c2)"` or `"RANK()"`. The default
/// implementation returns `"FUNCTION_NAME(args)"`
fn display_name(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this API change -- the AggregateFunction now has name(), func_name() and display_name()

I thought the reason for having name() was so that that the code that created the AggregateExpr would calculate the name once and then pass it along. The creation of the name happened once on construction rather than on demand

For example:
https://github.com/apache/arrow-datafusion/blob/4c914ea1e5d3dc61f3552adcffb8356c90d73bac/datafusion/proto/src/physical_plan/mod.rs#L464-L470

This PR seems to have added a parallel API for creating display names for aggregate functions which I think is quite confusing

Copy link
Contributor Author

@mustafasrepo mustafasrepo Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to remove dynamic calculation from implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try. However, replacing name with current display name did not work. It seems like during schema creation aggregate exprs uses .name for the field (See link). Hence plan depends on the .name string for successful execution. Otherwise subsequent projection after (window or aggregation) may refer to invalid fields.

Other option might be adding a display_name member to the each aggregate expression (similar to .name method). This display_name member will not be set from argument but will calculate at the initialization(Users will not have control over it). Then we will use .display_name(&self) -> &str API to return this member. With this way func_name argument will be removed. However, we will still have parallel .display_name on top .name which might be confusing. However, I think this approach is better than the current one.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we could add a name field to WindowFunction upon creation to mirror what happens with AggregateFunction. That way the code would be consistent and we can use the same pattern that works for AggregateFunction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we have name, func_name and display_name is an indication that we can do a better design. @mustafasrepo and I will discuss to see what we can do here.

@alamb
Copy link
Contributor

alamb commented Dec 12, 2023

@mustafasrepo is this PR ready for review again?

@mustafasrepo
Copy link
Contributor Author

mustafasrepo commented Dec 13, 2023

@mustafasrepo is this PR ready for review again?

Not really, I didn't have time to experiment with alternative solutions. For now, I am converting this PR to draft. Once it is ready, I will un-draft it.

@mustafasrepo mustafasrepo reopened this Dec 13, 2023
@mustafasrepo mustafasrepo marked this pull request as draft December 13, 2023 06:16
@synnada-ai synnada-ai closed this by deleting the head repository Feb 16, 2024
@ozankabak
Copy link
Contributor

To avoid any confusion, we will resume this work in a different branch/repo (we are just moving code around between repos).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants