Update to arrow 28 #4400

tustvold · 2022-11-28T10:13:47Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

tustvold · 2022-11-28T10:14:35Z

datafusion/expr/src/type_coercion/binary.rs

@@ -287,8 +287,8 @@ fn get_wider_decimal_type(
        (DataType::Decimal128(p1, s1), DataType::Decimal128(p2, s2)) => {
            // max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2)
            let s = *s1.max(s2);
-            let range = (p1 - s1).max(p2 - s2);
-            Some(create_decimal_type(range + s, s))
+            let range = (*p1 as i8 - s1).max(*p2 as i8 - s2);


Perhaps @liukun4515 you might be able to cast your eyes over this? My knowledge of decimals is fairly limited

thanks for the mention for me.
I think arrow ecosystem support negative scale, but in the datafusion we can support scale>=0.
We can add checker in the SQL level to ensure scale >=0 in the datafusion.
cc @alamb

In the datafusion, we just support the decimal128, and should make sure the precision>=1 and precision<=38 and scale>=0 and scale<=precision.

Do you have any thoughts for that? @tustvold

but in the datafusion we can support scale>=0

From an outsiders perspective, at least w.r.t decimals, I think breaking compatibility with the arrow specification in this way would likely be surprising to people. Is there a particular reason to not support negative scale?

Using the negative scale in the arrow ecosystem is ok for me.
But in the datafusion(SQL level system), it's better to make consistent with other SQL system, like Spark,PG,MySQL.
In other SQL level system, I have not seen the usage of negative scale.

In the PG

postgres=# create table test(c1 decimal(10,-1)); ERROR: NUMERIC scale -1 must be between 0 and precision 10 LINE 1: create table test(c1 decimal(10,-1));

In the Spark:

spark-sql> create table test_d(c1 decimal(10,-1)); Error in query: extraneous input '-' expecting INTEGER_VALUE(line 1, pos 34) == SQL == create table test_d(c1 decimal(10,-1))

Spark definitely supports negative scales, it might not expose them in its SQL frontend though. I think this PR is consistent with that, the SQL frontend still doesn't support negative scales, but DataFusion the query engine does

tustvold · 2022-11-28T10:16:14Z

datafusion/proto/proto/datafusion.proto

@@ -705,8 +705,9 @@ enum IntervalUnit{
 }

 message Decimal{
-    uint64 whole = 1;
-    uint64 fractional = 2;
+  reserved 1, 2;


This is a breaking change to the protobuf serialization of schemas

alamb

Thanks @tustvold -- this looks great. We can probably remove the crates.io patch and get this ready PR ready for review

I agree it would be great if @liukun4515 could review the change to Decimal 🙏

alamb · 2022-11-29T15:55:31Z

datafusion/common/src/dfschema.rs

@@ -947,12 +946,12 @@ mod tests {
    fn test_dfschema_to_schema_convertion() {
        let mut a: DFField = DFField::new(Some("table1"), "a", DataType::Int64, false);
        let mut b: DFField = DFField::new(Some("table1"), "b", DataType::Int64, false);
-        let mut a_metadata = BTreeMap::new();


❤️ -- thank you

alamb · 2022-11-29T15:56:12Z

datafusion/core/src/avro_to_arrow/schema.rs

@@ -280,16 +280,6 @@ fn external_props(schema: &AvroSchema) -> BTreeMap<String, String> {
    props
 }

-#[allow(dead_code)]


liukun4515 · 2022-11-30T03:22:43Z

datafusion/optimizer/src/simplify_expressions/utils.rs

-        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
-            *_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v
+        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s)) => {
+            *s >= 0


datafusion/sql/src/planner.rs

datafusion/sql/src/utils.rs

liukun4515

LGTM
Thanks @tustvold

ursabot · 2022-11-30T10:01:49Z

Benchmark runs are scheduled for baseline = 49166ea and contender = fdc83e8. fdc83e8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Update to arrow 28

7dbca5b

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner labels Nov 28, 2022

tustvold commented Nov 28, 2022

View reviewed changes

tustvold added 5 commits November 28, 2022 10:35

Fix avro

ef02aea

Fix bad_extension_planner test

5828225

Further tweaks

da78206

More fixes

39ead62

Fixes

961c793

alamb reviewed Nov 29, 2022

View reviewed changes

Remove crates patch

d6a6130

tustvold marked this pull request as ready for review November 29, 2022 16:08

alamb approved these changes Nov 29, 2022

View reviewed changes

liukun4515 reviewed Nov 30, 2022

View reviewed changes

datafusion/sql/src/planner.rs Outdated Show resolved Hide resolved

liukun4515 reviewed Nov 30, 2022

View reviewed changes

datafusion/sql/src/utils.rs Outdated Show resolved Hide resolved

Review feedback

cc778f9

tustvold requested a review from liukun4515 November 30, 2022 09:30

liukun4515 approved these changes Nov 30, 2022

View reviewed changes

liukun4515 merged commit fdc83e8 into apache:master Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to arrow 28 #4400

Update to arrow 28 #4400

tustvold commented Nov 28, 2022

tustvold Nov 28, 2022

liukun4515 Nov 30, 2022

liukun4515 Nov 30, 2022

tustvold Nov 30, 2022

liukun4515 Nov 30, 2022

tustvold Nov 30, 2022 •

edited

Loading

tustvold Nov 28, 2022

alamb left a comment

alamb Nov 29, 2022

alamb Nov 29, 2022

liukun4515 Nov 30, 2022

liukun4515 left a comment

ursabot commented Nov 30, 2022

Update to arrow 28 #4400

Update to arrow 28 #4400

Conversation

tustvold commented Nov 28, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 left a comment

Choose a reason for hiding this comment

ursabot commented Nov 30, 2022

tustvold Nov 30, 2022 •

edited

Loading