Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PartitionEvaluatorArgs to WindowUDFImpl::partition_evaluator #12804

Merged
merged 7 commits into from
Oct 9, 2024

Conversation

jcsherin
Copy link
Contributor

@jcsherin jcsherin commented Oct 8, 2024

Which issue does this PR close?

Closes #12803.

Rationale for this change

In BuiltInWindowFunction::{Lead, Lag} to create an instance of PartitionEvaluator directly/indirectly depends on the following being available:

  1. arguments to the window function expression,
  2. the data type of the arguments,
  3. whether IGNORE NULLS is specified,
  4. whether the execution order is reversed.

Currently none of this is available when implementing user-defined window functions.

What changes are included in this PR?

  1. Add PartitionEvaluatorArgs to WindowUDFImpl::partition_evaulator.
pub trait WindowUDFImpl: Debug + Send + Sync {

    /// Invoke the function, returning the [`PartitionEvaluator`] instance
    fn partition_evaluator(
        &self,
        partition_evaluator_args: PartitionEvaluatorArgs,
    ) -> Result<Box<dyn PartitionEvaluator>>;

}
  1. Defines PartitionEvaluatorArgs:
/// Defines the state of the user-defined window function during
/// physical execution.
#[derive(Debug, Default)]
pub struct PartitionEvaluatorArgs<'a> {
    /// The expressions passed as arguments to the user-defined window
    /// function.
    input_exprs: &'a [Arc<dyn PhysicalExpr>],
    /// The corresponding data types of expressions passed as arguments
    /// to the user-defined window function.
    input_types: &'a [DataType],
    /// Set to `true` if the user-defined window function is reversed.
    is_reversed: bool,
    /// Set to `true` if `IGNORE NULLS` is specified.
    ignore_nulls: bool,
}

Are these changes tested?

Yes. Updated unit tests for row_number() user-defined window function.

Are there any user-facing changes?

Yes.

datafusion_target_doc_datafusion_functions_window_common_partition_struct PartitionEvaluatorArgs html

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate labels Oct 8, 2024
@alamb alamb added the api change Changes the API exposed to users of the crate label Oct 8, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jcsherin -- this PR makes sense to me

I had a small API suggestions (but I also think we could do it as a follow on PR)

Also, I think this PR is an API change (for the better) so I added the api-change label

Comment on lines 22 to 23
/// Defines the state of the user-defined window function during
/// physical execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't quite match what I think the structure does. Perhaps something like this would be more appropriate

Suggested change
/// Defines the state of the user-defined window function during
/// physical execution.
/// Arguments passed to create user-defined window function state during
/// physical execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 66 to 70
/// Returns `Some(expr)` argument at index if it exists, otherwise
/// returns `None`.
pub fn input_expr_at(&self, index: usize) -> Option<&Arc<dyn PhysicalExpr>> {
self.input_exprs.get(index)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an API that would be more consistent with others in this would be to expose the input_exprs directly.

Among other things that would allow users of this API to find out how many (len()) were passed as well as get an iterator, etc.

    /// Returns the arguments passed to the function
    pub fn input_exprs(&self) -> &'a [Arc<dyn PhysicalExpr>] {
        self.input_exprs
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I extended the change to also expose input_types directly for similar reasons.

    /// Returns the expressions passed as arguments to the user-defined
    /// window function.
    pub fn input_exprs(&self) -> &'a [Arc<dyn PhysicalExpr>] {
        self.input_exprs
    }

    /// Returns the [`DataType`]s corresponding to the input expressions
    /// to the user-defined window function.
    pub fn input_types(&self) -> &'a [DataType] {
        self.input_types
    }

@@ -385,7 +386,13 @@ impl BuiltInWindowFunctionExpr for WindowUDFExpr {
}

fn create_evaluator(&self) -> Result<Box<dyn PartitionEvaluator>> {
self.fun.partition_evaluator_factory()
self.fun
.partition_evaluator_factory(PartitionEvaluatorArgs::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the key reason to add these args

@alamb alamb merged commit 30de35e into apache:main Oct 9, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 9, 2024

Thanks again @jcsherin

@jcsherin
Copy link
Contributor Author

jcsherin commented Oct 9, 2024

Thanks for the review @alamb

@jcsherin jcsherin deleted the add-partition-evaluator-args branch October 9, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PartitionEvaluatorArgs to WindowUDFImpl::partition_evaluator
2 participants