-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to arrow 28 #4400
Update to arrow 28 #4400
Conversation
@@ -287,8 +287,8 @@ fn get_wider_decimal_type( | |||
(DataType::Decimal128(p1, s1), DataType::Decimal128(p2, s2)) => { | |||
// max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2) | |||
let s = *s1.max(s2); | |||
let range = (p1 - s1).max(p2 - s2); | |||
Some(create_decimal_type(range + s, s)) | |||
let range = (*p1 as i8 - s1).max(*p2 as i8 - s2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps @liukun4515 you might be able to cast your eyes over this? My knowledge of decimals is fairly limited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the mention for me.
I think arrow ecosystem support negative scale, but in the datafusion we can support scale>=0
.
We can add checker in the SQL level to ensure scale >=0
in the datafusion.
cc @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the datafusion, we just support the decimal128, and should make sure the precision>=1 and precision<=38
and scale>=0 and scale<=precision
.
Do you have any thoughts for that? @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in the datafusion we can support scale>=0
From an outsiders perspective, at least w.r.t decimals, I think breaking compatibility with the arrow specification in this way would likely be surprising to people. Is there a particular reason to not support negative scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the negative scale in the arrow ecosystem is ok for me.
But in the datafusion(SQL level system), it's better to make consistent with other SQL system, like Spark,PG,MySQL.
In other SQL level system, I have not seen the usage of negative scale.
In the PG
postgres=# create table test(c1 decimal(10,-1));
ERROR: NUMERIC scale -1 must be between 0 and precision 10
LINE 1: create table test(c1 decimal(10,-1));
In the Spark:
spark-sql> create table test_d(c1 decimal(10,-1));
Error in query:
extraneous input '-' expecting INTEGER_VALUE(line 1, pos 34)
== SQL ==
create table test_d(c1 decimal(10,-1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark definitely supports negative scales, it might not expose them in its SQL frontend though. I think this PR is consistent with that, the SQL frontend still doesn't support negative scales, but DataFusion the query engine does
@@ -705,8 +705,9 @@ enum IntervalUnit{ | |||
} | |||
|
|||
message Decimal{ | |||
uint64 whole = 1; | |||
uint64 fractional = 2; | |||
reserved 1, 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change to the protobuf serialization of schemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tustvold -- this looks great. We can probably remove the crates.io
patch and get this ready PR ready for review
I agree it would be great if @liukun4515 could review the change to Decimal 🙏
@@ -947,12 +946,12 @@ mod tests { | |||
fn test_dfschema_to_schema_convertion() { | |||
let mut a: DFField = DFField::new(Some("table1"), "a", DataType::Int64, false); | |||
let mut b: DFField = DFField::new(Some("table1"), "b", DataType::Int64, false); | |||
let mut a_metadata = BTreeMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ -- thank you
@@ -280,16 +280,6 @@ fn external_props(schema: &AvroSchema) -> BTreeMap<String, String> { | |||
props | |||
} | |||
|
|||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 👍
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, _s)) => { | ||
*_s < DECIMAL128_MAX_PRECISION && POWS_OF_TEN[*_s as usize] == *v | ||
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s)) => { | ||
*s >= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @tustvold
Benchmark runs are scheduled for baseline = 49166ea and contender = fdc83e8. fdc83e8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?