[SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic #20350

mgaido91 · 2018-01-22T14:34:50Z

What changes were proposed in this pull request?

SQL ANSI 2011 states that in case of overflow during arithmetic operations, an exception should be thrown. This is what most of the SQL DBs do (eg. SQLServer, DB2). Hive currently returns NULL (as Spark does) but HIVE-18291 is open to be SQL compliant.

The PR introduce an option to decide which behavior Spark should follow, ie. returning NULL on overflow or throwing an exception.

How was this patch tested?

added UTs

hvanhovell · 2018-01-22T14:50:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

+    } else {
+      val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)."
+      if (nullOnOverflow) {
+        logWarning(s"$message NULL is returned.")


I am not sure if we should log this message. If we hit this often we'll end up with huge logs.

If we hit it often, the result we get is quite useless. I added it only to notify the user of something which is an unexpected/undesired situation and now happens silently. I think it is bad that the user cannot know if a NULL is a result of an operation involving NULLs or the result of an overflow.

I agree that a result becomes less useful if we return nulls often. My problem is more that if we process a million non convertible decimals we log the same message a million times, which is going to cause a significant regression. Moreover this is logged on the executor, an end-user typically does not look at those logs (there is also no reason to do so since the job does not throw an error).

My suggestion would be to not log at all, or just log once. I prefer not to log at all.

I see your point. And I agree with you. But I wanted to put some traces of what was happening What about using DEBUG as log level? In this case most of the time we are not logging anything, but if we want to check is an overflow is happening we can. What do you think?

I am ok with using debug/trace level logging. Can you make sure we do not construct the message unless we are logging or throwing the exception (changing val into def should be enough)?

SparkQA · 2018-01-22T16:57:10Z

Test build #86484 has finished for PR 20350 at commit 449b69c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CheckOverflow(
final class Decimal extends Ordered[Decimal] with Serializable with Logging

SparkQA · 2018-01-22T20:53:13Z

Test build #86488 has finished for PR 20350 at commit fcd665e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-01-22T21:23:26Z

Jenkins, retest this please

mgaido91 · 2018-01-22T21:23:42Z

cc @gatorsmile @cloud-fan

SparkQA · 2018-01-22T23:10:16Z

Test build #86495 has finished for PR 20350 at commit fcd665e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-01-23T03:16:18Z

sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql

@@ -49,7 +49,6 @@ select 1e35 / 0.1;

 -- arithmetic operations causing a precision loss are truncated
 select 123456789123456789.1234567890 * 1.123456789123456789;
-select 0.001 / 9876543210987654321098765432109876543.2


I think it is missing a ; before...

yes, unfortunately I missed it somehow previously...

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

SparkQA · 2018-01-23T12:57:00Z

Test build #86524 has finished for PR 20350 at commit 610a595.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

bersprockets · 2018-01-23T15:02:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      .doc("When true (default), if an overflow on a decimal occurs, then NULL is returned. " +
+        "Spark's older versions and Hive behave in this way. If turned to false, SQL ANSI 2011 " +
+        "specification, will be followed instead: an arithmetic exception is thrown. This is " +
+        "what most of the SQL databases do.")


Tiny nit:

If turned to false, SQL ANSI 2011 specification, will be followed instead

This should be

If turned to false, SQL ANSI 2011 specification will be followed instead

SparkQA · 2018-01-23T16:37:07Z

Test build #86528 has finished for PR 20350 at commit c73471d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
final class Decimal extends Ordered[Decimal] with Serializable

SparkQA · 2018-01-23T18:37:09Z

Test build #86533 has finished for PR 20350 at commit 2c8e2c7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-01-29T11:54:44Z

kindly ping @gatorsmile @cloud-fan

gatorsmile · 2018-01-29T17:43:27Z

Thanks for your contributions! Could you ping us again after 2.3 release?

mgaido91 · 2018-01-29T19:36:18Z

sure, thanks @gatorsmile

SparkQA · 2018-02-07T14:00:26Z

Test build #87154 has finished for PR 20350 at commit bd8b645.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-02-28T17:34:15Z

retest this please

SparkQA · 2018-02-28T19:17:51Z

Test build #87794 has finished for PR 20350 at commit bd8b645.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-02-28T19:25:23Z

the error is unrelated, and I am seeing it frequently throughout the code. It seems something caused the flakiness to increase for this test. There is already a ticket for it: SPARK-23369, but it is becoming more and more important to fix it. It would be great also to check what increased the flakiness...

kiszk · 2018-03-01T13:21:26Z

retest this please

mgaido91 · 2018-03-01T16:35:06Z

sorry @gatorsmile, now that RC for 2.3 has passed the vote, do you happen to have time to look at this? Thanks.

SparkQA · 2018-03-01T16:38:10Z

Test build #87841 has finished for PR 20350 at commit bd8b645.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-01T17:03:50Z

Sure, will do the review in the next few days.

SparkQA · 2018-06-19T16:11:09Z

Test build #92087 has finished for PR 20350 at commit 069b861.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-16T02:55:01Z

retest this please

SparkQA · 2018-07-16T07:05:02Z

Test build #93072 has finished for PR 20350 at commit 069b861.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-07-16T10:52:00Z

My understanding from #21499 (comment) is that the plan you have in mind is to have this in 3.0 and not in 2.4 @gatorsmile , am I right? If this is the case, shall I close this now and reopen once 2.4 is out? Thanks.

SparkQA · 2018-07-22T06:19:42Z

Test build #93401 has finished for PR 20350 at commit 069b861.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mickjermsurawong-stripe · 2019-06-21T01:50:25Z

hi @mgaido91! I'm hoping to have this feature in 3.0 too. Thank you for the work here :)
So +1 on this getting picked up and merged!

For now I'm trying to cherry-pick this to our local fork. I'd love your input here please: how do you think we would handle overflow problem when our dataset doesn't involve any arithmetic operation on Sql type? When it is just round-tripping between jvm BigDecimal to sql Decimal, we can still get back null.
I'm thinking that I would need a similar check at encoding/decoding (ScalaReflections that does deserializer)

Concretely, this PR would fix nicely throw exception instead of null when we have the option enabled.

    val result = spark
      .sql(s"select cast(${smallDecimalWithFullPrecision} as decimal(38, 38)) + 1")
      .first()
    result shouldEqual Row(null)

However, dataset round-tripping here will still return null

    val result: Seq[BigDecimal] = spark
      .createDataset(Seq(BigDecimal("123456789" * 4)))(ExpressionEncoder[BigDecimal])
      .map(identity(_))(ExpressionEncoder[BigDecimal])
      .collect()
    result shouldEqual Seq(null)

mgaido91 · 2019-06-21T21:09:13Z

@mickjermsurawong-stripe thanks for your comment. I am updating this PR resolving the conflicts and I hope that your feedback will help this PR moving forward.

As far as your question is regarded, you may consider adding an AssertNotNull to the output of the decoding, in order to get an exception in case that conversion fails. This is not really feasible - of course - if your input BigDecimal can contain null, ie. if it is an Option[BigDecimal] you want to get in SQL. Another option which may work for you is what is done in this PR now, after my update, in the RowEncoder. The check there may solve your issue too.

SparkQA · 2019-06-21T21:30:08Z

Test build #106780 has finished for PR 20350 at commit 37f47ef.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-22T00:57:56Z

Test build #106782 has finished for PR 20350 at commit bc25c0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-06-22T09:19:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -1441,6 +1441,16 @@ object SQLConf {
      .booleanConf
      .createWithDefault(true)

+  val DECIMAL_OPERATIONS_NULL_ON_OVERFLOW =
+    buildConf("spark.sql.decimalOperations.nullOnOverflow")


overflow can happen with non-decimal operations, do we need a new config?

cc @JoshRosen

Thanks for taking a look at this @cloud-fan !

Yes, that case (non-decimal) is handled in #21599. I'd say that, in the non-decimal case, the situation is pretty different. Indeed, overflow in decimal operation is handled by Spark now, converting overflow operations to null; while overflow in operation on non-decimal isn't handled at all currently.

In non-decimal operations, indeed we return a wrong value (the java way). So IMHO, the non-decimal case current behavior doesn't make any sense at all (considering this is SQL and not a low level language like Java/Scala) and keeping its current behavior makes no sense (we already discussed this in that PR actually).

A DB does not have to follow the SQL standard completely in every corners. The current behavior in Spark is by design and I don't think that's nonsense.

I do agree that it's a valid requirement that some users want overflow to fail, but it should be protected by a config.

My question is if we need one config for overflow, or 2 configs for decimal and non-decimal.

A DB does not have to follow the SQL standard completely in every corners. The current behavior in Spark is by design and I don't think that's nonsense.

I am sorry, but I don't really agree with you on this. I see the discussion is a bit OT, but I'd like just to explain the reasons of my opinion. SQL is a declarative language and here we are coupling the result/behavior to the specific execution language we are using. Spark is cross-language, but for arithmetic operations overflow works in a very peculiar way of the language we use which is:

against SQL standards and no other DB works differently from SQL standards w.r.t. this, so very surprising (at least) for SQL users;

different from what happens in Python and in R when you overflow in those languages (an Int becomes long and so on there);

So there in no Spark user other than Scala/Java ones who might understand the behavior Spark has in those cases. Sorry for being a bit OT, anyway.

My question is if we need one config for overflow, or 2 configs for decimal and non-decimal.

Yes, this is the main point here. IMHO, I'd prefer 2 configs because when the config is turned off, the behavior is completely different: in once case it returns null, in the other we return wrong results. But I see also the value in reducing as much as possible the number of configs, which is already pretty big. So I'd prefer 2 configs, but if you and the community thinks 1 it is better, I can update the PR in order to make this config more generic.

Thanks for your feedbacks and the discussion!

For now, I think separate flags are okay. Here's why:

While eventually we probably want to add flaggable non-Decimal overflow detection (see [SPARK-26218][SQL] Overflow on arithmetic operations returns incorrect result #21599 (comment)), these PRs should land separately (to limit scope of changes / code review). If we give this PR's flag a generic name, merge this PR, and then somehow fail to merge the integer overflow PR in time for 3.0 then we'd be facing a situation where we'd need to change the behavior of a released flag if we later merge the non-Decimal overflow PR.

If we implement separate flags for each type of overflow then that doesn't preclude us from later introducing a single flag which is used as the default value for the per-type flags.

I'm interested in whichever option allows us to make incremental progress by getting this merged (even if flagged off by default) so that we can rely on this functionality being available in 3.x instead of having to maintain it indefinitely in our own fork (with all of the associated long-term maintenance and testing burdens).

One followup question regarding flag naming: is "overflow" the most precise term for the change made here? Or does this flag also change behavior in precision-loss scenarios? Maybe I'm getting tripped up on terminology here, since insufficient precision to represent small fractional quantities is essentially an "overflow" of the digit space reserved to represent the fractional part.

Thanks for your comments @JoshRosen.
Yes, this deals with the overflow case. The underflow (or precision loss) is handled in a different way and the behavior depends on another config (see SPARK-22036): it either avoids precision loss, causing eventually overflow (old behavior) or truncates (as defined by the SQL standard and following closely SQL server behavior from which we derived our decimal operations implementation). So this flag is related only to the overflow case.

mickjermsurawong-stripe · 2019-06-26T18:41:29Z

hi @mgaido91 I find two other places where we might want to add this check for consistent behavior

At encoder level (thanks you for the addition on RowEncoder in this PR). We may similarly need it for consistency here at SerializerBuildHelper https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala#L101-L121
Actually I see that you made a patch on the RowEncoder on to add CheckOverflow, so here we can have both handling of precision loss and overflow.
When we use agg(sum) the aggregation, it's currently not handling overflow even though the documentation suggests that it follows implementation in DecimalPrecision
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1377-L1413

mgaido91 · 2019-06-27T07:38:31Z

@mickjermsurawong-stripe let me answer to your two comments separately:

At encoder level (thanks you for the addition on RowEncoder in this PR). We may similarly need it for consistency here at SerializerBuildHelper

Not really, when we serialize a SQL decimal to a Java/Scala BigDecimal, we cannot have overflow. So this patch/case doesn't apply (as you can see, there is not even a CheckOverflow there).

When we use agg(sum) the aggregation

Thanks for pointing this out! Actually this is a weird operator, because it does throw an exception on overflow in interpreted mode, while in codegen it doesn't. Moreover, it throw a IllegalArgumentException, instead of the ArithmeticException Since this is a weird behavior of the operator itself, I'd prefer having a PR targeting that specific operator and addressing its behavior totally, since it need to be revisited carefully. Are there concerns on this plan?

cloud-fan · 2019-06-27T11:02:13Z

I'd prefer having a PR targeting that specific operator and addressing its behavior totally

SGTM

I'm merging this PR, thanks!

mgaido91 · 2019-06-27T11:23:31Z

thanks @cloud-fan and thank you all for the reviews!

mickjermsurawong-stripe · 2019-06-27T15:07:23Z

@mgaido91,

when we serialize a SQL decimal to a Java/Scala BigDecimal, we cannot have overflow

I think we should catch overflowing when encoding Java/Scala BigDecimal to SQL decimal (not the other way round), and that happens at the SerializerBuildHelper.
If there's no check at serialization, the decimal will still preserve java/scala bigdecimal until at UnsafeRowWriter where null can be silently introduced there. The proposal is to catch this earlier at the encoding part.

To the current test structure in ExpressionEncoderSuite, this would fail with NPE on the round-tripped results

encodeDecodeTest(BigDecimal("9" * 21), "overflowing big decimal")

To pinpoint at the encoder part, the new test here shows null row for decimal type.

  test("big decimal exceeding precision serialized to row") {
    val overflowing = BigDecimal("9" * 21)
    val encoder = ExpressionEncoder[BigDecimal]
    val row = encoder.toRow(overflowing)
    assert(row.get(0, DecimalType.SYSTEM_DEFAULT) === null)
  }

I can make a separate PR on this if this sounds good to you.

JoshRosen · 2019-06-27T15:20:22Z

Actually this is a weird operator, because it does throw an exception on overflow in interpreted mode, while in codegen it doesn't. Moreover, it throw a IllegalArgumentException, instead of the ArithmeticException Since this is a weird behavior of the operator itself, I'd prefer having a PR targeting that specific operator and addressing its behavior totally, since it need to be revisited carefully. Are there concerns on this plan?

+1; it sounds like the pre-existing difference between the codegen and interpreted paths is a separate, pre-existing bug. It's also especially hard to reason about because (AFAIK) the paths in DecimalAggregates are only used for certain sizes of decimals, so the behavioral inconsistency can be triggered by a combination of precision/scale and codegen/interpreted (which is really confusing!).

For consistency, I think we should:

Ensure that the agg(sum) codegen and interpreted paths behave the same w.r.t nullOnOverflow == true (the default / 2.x behavior).
Respect the nullOnOverflow flag in agg(sum) codegen.

Let's make a followup JIRA for this change and a separate JIRA for the encoder changes @mickjermsurawong-stripe discussed in his comment (I can loop back later this morning or afternoon to help file these).

Edit: in @mickjermsurawong-stripe's PR we can improve test coverage for both sets of encoders (RowEncoder and ExpressionEncoder / ScalaReflection), since AFAIK we don't have a dedicated unit tests for overflow detection in Encoder overflow (this PR did improve things, but I don't think it's directly tested; if that change is covered here then I think it's done a bit indirectly via another test).

mgaido91 · 2019-06-28T08:14:47Z

I see now, sorry for misunderstanding @mickjermsurawong-stripe. I think it is fine to go ahead with your PR. I created https://issues.apache.org/jira/browse/SPARK-28200 for it. So please go ahead submitting your PR for that JIRA.

I created also https://issues.apache.org/jira/browse/SPARK-28201 for the MakeDecimal case. I'll work on it ASAP.

Thanks!

## What changes were proposed in this pull request? In SPARK-23179, it has been introduced a flag to control the behavior in case of overflow on decimals. The behavior is: returning `null` when `spark.sql.decimalOperations.nullOnOverflow` (default and traditional Spark behavior); throwing an `ArithmeticException` if that conf is false (according to SQL standards, other DBs behavior). `MakeDecimal` so far had an ambiguous behavior. In case of codegen mode, it returned `null` as the other operators, but in interpreted mode, it was throwing an `IllegalArgumentException`. The PR aligns `MakeDecimal`'s behavior with the one of other operators as defined in SPARK-23179. So now both modes return `null` or throw `ArithmeticException` according to `spark.sql.decimalOperations.nullOnOverflow`'s value. Credits for this PR to mickjermsurawong-stripe who pointed out the wrong behavior in apache#20350. ## How was this patch tested? improved UTs Closes apache#25010 from mgaido91/SPARK-28201. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? - Currently, `ExpressionEncoder` does not handle bigdecimal overflow. Round-tripping overflowing java/scala BigDecimal/BigInteger returns null. - The serializer encode java/scala BigDecimal to to sql Decimal, which still has the underlying data to the former. - When writing out to UnsafeRow, `changePrecision` will be false and row has null value. https://github.com/apache/spark/blob/24e1e41648de58d3437e008b187b84828830e238/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java#L202-L206 - In [SPARK-23179](apache#20350), an option to throw exception on decimal overflow was introduced. - This PR adds the option in `ExpressionEncoder` to throw when detecting overflowing BigDecimal/BigInteger before its corresponding Decimal gets written to Row. This gives a consistent behavior between decimal arithmetic on sql expression (DecimalPrecision), and getting decimal from dataframe (RowEncoder) Thanks to mgaido91 for the very first PR `SPARK-23179` and follow-up discussion on this change. Thanks to JoshRosen for working with me on this. ## How was this patch tested? added unit tests Closes apache#25016 from mickjermsurawong-stripe/SPARK-28200. Authored-by: Mick Jermsurawong <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-23179][SQL] Support option to throw exception if overflow occurs

449b69c

hvanhovell reviewed Jan 22, 2018

View reviewed changes

fix ut failures

fcd665e

viirya reviewed Jan 23, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala Show resolved Hide resolved

remove log

610a595

remove unneeded logging

c73471d

bersprockets reviewed Jan 23, 2018

View reviewed changes

fix doc

2c8e2c7

mgaido91 mentioned this pull request Feb 3, 2018

[SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql #20498

Closed

Merge branch 'master' of github.com:apache/spark into SPARK-23179

bd8b645

dongjoon-hyun added the SQL label Jun 14, 2019

Merge branch 'master' of github.com:apache/spark into SPARK-23179

37f47ef

fix compilation error

bc25c0d

cloud-fan reviewed Jun 22, 2019

View reviewed changes

JoshRosen changed the title ~~[SPARK-23179][SQL] Support option to throw exception if overflow occurs~~ [SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic Jun 25, 2019

cloud-fan closed this in 3139d64 Jun 27, 2019

mgaido91 mentioned this pull request Jun 29, 2019

[SPARK-28201][SQL] Revisit MakeDecimal behavior on overflow #25010

Closed

mickjermsurawong-stripe mentioned this pull request Jun 30, 2019

[SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder #25016

Closed

[SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic #20350

[SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic #20350

Conversation

mgaido91 commented Jan 22, 2018

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 22, 2018

SparkQA commented Jan 22, 2018

mgaido91 commented Jan 22, 2018

mgaido91 commented Jan 22, 2018

SparkQA commented Jan 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 23, 2018

Choose a reason for hiding this comment

SparkQA commented Jan 23, 2018

SparkQA commented Jan 23, 2018

mgaido91 commented Jan 29, 2018

gatorsmile commented Jan 29, 2018

mgaido91 commented Jan 29, 2018

SparkQA commented Feb 7, 2018

kiszk commented Feb 28, 2018

SparkQA commented Feb 28, 2018

mgaido91 commented Feb 28, 2018 • edited Loading

kiszk commented Mar 1, 2018

mgaido91 commented Mar 1, 2018

SparkQA commented Mar 1, 2018

gatorsmile commented Mar 1, 2018

SparkQA commented Jun 19, 2018

HyukjinKwon commented Jul 16, 2018

SparkQA commented Jul 16, 2018

mgaido91 commented Jul 16, 2018

SparkQA commented Jul 22, 2018

mickjermsurawong-stripe commented Jun 21, 2019

mgaido91 commented Jun 21, 2019 • edited Loading

SparkQA commented Jun 21, 2019

SparkQA commented Jun 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mickjermsurawong-stripe commented Jun 26, 2019

mgaido91 commented Jun 27, 2019

cloud-fan commented Jun 27, 2019

mgaido91 commented Jun 27, 2019

mickjermsurawong-stripe commented Jun 27, 2019

JoshRosen commented Jun 27, 2019 • edited Loading

mgaido91 commented Jun 28, 2019

mgaido91 commented Feb 28, 2018 •

edited

Loading

mgaido91 commented Jun 21, 2019 •

edited

Loading

JoshRosen commented Jun 27, 2019 •

edited

Loading