Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make ntile work in some corner cases #8371

Merged
merged 4 commits into from
Dec 1, 2023
Merged

Conversation

haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #8284

Rationale for this change

fix #8284

What changes are included in this PR?

Add parameter type checking using Signature,
before

❯ select ntile('1') OVER(ORDER BY a) from t1;
Internal error: Cannot convert Utf8("1") to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯ select ntile([]) OVER(ORDER BY a) from t1;
Internal error: Cannot convert List([NullArray(0)]) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

after

❯ select ntile([]) OVER(ORDER BY a) from t1;
Error during planning: No function matches the given name and argument types 'NTILE(List(Field { name: "item", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }))'. You might need to add explicit type casts.
        Candidate functions:
        NTILE(UInt64/UInt32/UInt16/UInt8/Int64/Int32/Int16/Int8)
❯ select ntile('1') OVER(ORDER BY a) from t1;
Error during planning: No function matches the given name and argument types 'NTILE(Utf8)'. You might need to add explicit type casts.
        Candidate functions:
        NTILE(UInt64/UInt32/UInt16/UInt8/Int64/Int32/Int16/Int8)

make ntile work properly, when parameter greater than table's row number

Are these changes tested?

yes, add more test

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Nov 30, 2023
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Nov 30, 2023

cc @msirek @alamb , PTAL, I fix all incorrect results find in #8284, and add more tests.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin we prob can also test partition by null order by null cases

@alamb
Copy link
Contributor

alamb commented Nov 30, 2023

Perhaps @mustafasrepo or @ozankabak has time to review this code as they are more familiar with NTILE

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks good -- thank you @haohuaijin . If you can make the changes requested by @comphead I think this PR will be good to go

@haohuaijin
Copy link
Contributor Author

@comphead I find partition by null and order by null will fail in all windows function(maybe, I'm not test all)

❯ CREATE TABLE t1 (a int) AS VALUES (1), (2), (3);
0 rows in set. Query took 0.018 seconds.

❯ select rank() over (partition by null) from t1;
thread 'tokio-runtime-worker' panicked at /home/hhj/datafusion/datafusion/physical-plan/src/repartition/mod.rs:208:79:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("must either specify a row count or at least one column")
Execution error: Expects PARTITION BY expression to be ordered

❯ select rank() over (order by null) from t1;
type_coercion
caused by
Internal error: Cannot run range queries on datatype: Null.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

❯ select sum(a) over (order by null) from t1;
type_coercion
caused by
Internal error: Cannot run range queries on datatype: Null.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

❯ select sum(a) over (partition by null) from t1;
Internal error: All partition by columns should have an ordering.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

I file an issue #8386 to track this.

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @haohuaijin for this PR. It is rally nice. I added a minor comment (purely stylistic, you can ignore it if you wish). This PR is LGTM!.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin and @mustafasrepo

@alamb alamb merged commit eb5aa22 into apache:main Dec 1, 2023
22 checks passed
@haohuaijin haohuaijin deleted the fix_ntile branch December 1, 2023 23:33
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 14, 2023
* fix: make ntile work in some corner cases

* fix comments

* minor

* Update datafusion/sqllogictest/test_files/window.slt

Co-authored-by: Mustafa Akur <[email protected]>

---------

Co-authored-by: Mustafa Akur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Miscellaneous ntile function bugs, possible incorrect results
4 participants