Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to arrow 28 #4400

Merged
merged 8 commits into from
Nov 30, 2022
Merged

Update to arrow 28 #4400

merged 8 commits into from
Nov 30, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner labels Nov 28, 2022
@@ -287,8 +287,8 @@ fn get_wider_decimal_type(
(DataType::Decimal128(p1, s1), DataType::Decimal128(p2, s2)) => {
// max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2)
let s = *s1.max(s2);
let range = (p1 - s1).max(p2 - s2);
Some(create_decimal_type(range + s, s))
let range = (*p1 as i8 - s1).max(*p2 as i8 - s2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps @liukun4515 you might be able to cast your eyes over this? My knowledge of decimals is fairly limited

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the mention for me.
I think arrow ecosystem support negative scale, but in the datafusion we can support scale>=0.
We can add checker in the SQL level to ensure scale >=0 in the datafusion.
cc @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the datafusion, we just support the decimal128, and should make sure the precision>=1 and precision<=38 and scale>=0 and scale<=precision.

Do you have any thoughts for that? @tustvold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the datafusion we can support scale>=0

From an outsiders perspective, at least w.r.t decimals, I think breaking compatibility with the arrow specification in this way would likely be surprising to people. Is there a particular reason to not support negative scale?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the negative scale in the arrow ecosystem is ok for me.
But in the datafusion(SQL level system), it's better to make consistent with other SQL system, like Spark,PG,MySQL.
In other SQL level system, I have not seen the usage of negative scale.

In the PG

postgres=# create table test(c1 decimal(10,-1));
ERROR:  NUMERIC scale -1 must be between 0 and precision 10
LINE 1: create table test(c1 decimal(10,-1));

In the Spark:

spark-sql> create table test_d(c1 decimal(10,-1));
Error in query:
extraneous input '-' expecting INTEGER_VALUE(line 1, pos 34)

== SQL ==
create table test_d(c1 decimal(10,-1))

Copy link
Contributor Author

@tustvold tustvold Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark definitely supports negative scales, it might not expose them in its SQL frontend though. I think this PR is consistent with that, the SQL frontend still doesn't support negative scales, but DataFusion the query engine does

@@ -705,8 +705,9 @@ enum IntervalUnit{
}

message Decimal{
uint64 whole = 1;
uint64 fractional = 2;
reserved 1, 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change to the protobuf serialization of schemas

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tustvold -- this looks great. We can probably remove the crates.io patch and get this ready PR ready for review

I agree it would be great if @liukun4515 could review the change to Decimal 🙏

@@ -947,12 +946,12 @@ mod tests {
fn test_dfschema_to_schema_convertion() {
let mut a: DFField = DFField::new(Some("table1"), "a", DataType::Int64, false);
let mut b: DFField = DFField::new(Some("table1"), "b", DataType::Int64, false);
let mut a_metadata = BTreeMap::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ -- thank you

@@ -280,16 +280,6 @@ fn external_props(schema: &AvroSchema) -> BTreeMap<String, String> {
props
}

#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 👍

@tustvold tustvold marked this pull request as ready for review November 29, 2022 16:08
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => {
*_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s)) => {
*s >= 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @tustvold

@liukun4515 liukun4515 merged commit fdc83e8 into apache:master Nov 30, 2022
@ursabot
Copy link

ursabot commented Nov 30, 2022

Benchmark runs are scheduled for baseline = 49166ea and contender = fdc83e8. fdc83e8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants