Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic #20350

Closed
wants to merge 11 commits into from

Conversation

mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

SQL ANSI 2011 states that in case of overflow during arithmetic operations, an exception should be thrown. This is what most of the SQL DBs do (eg. SQLServer, DB2). Hive currently returns NULL (as Spark does) but HIVE-18291 is open to be SQL compliant.

The PR introduce an option to decide which behavior Spark should follow, ie. returning NULL on overflow or throwing an exception.

How was this patch tested?

added UTs

} else {
val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)."
if (nullOnOverflow) {
logWarning(s"$message NULL is returned.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should log this message. If we hit this often we'll end up with huge logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hit it often, the result we get is quite useless. I added it only to notify the user of something which is an unexpected/undesired situation and now happens silently. I think it is bad that the user cannot know if a NULL is a result of an operation involving NULLs or the result of an overflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a result becomes less useful if we return nulls often. My problem is more that if we process a million non convertible decimals we log the same message a million times, which is going to cause a significant regression. Moreover this is logged on the executor, an end-user typically does not look at those logs (there is also no reason to do so since the job does not throw an error).

My suggestion would be to not log at all, or just log once. I prefer not to log at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. And I agree with you. But I wanted to put some traces of what was happening What about using DEBUG as log level? In this case most of the time we are not logging anything, but if we want to check is an overflow is happening we can. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with using debug/trace level logging. Can you make sure we do not construct the message unless we are logging or throwing the exception (changing val into def should be enough)?

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86484 has finished for PR 20350 at commit 449b69c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CheckOverflow(
  • final class Decimal extends Ordered[Decimal] with Serializable with Logging

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86488 has finished for PR 20350 at commit fcd665e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

Jenkins, retest this please

@mgaido91
Copy link
Contributor Author

cc @gatorsmile @cloud-fan

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86495 has finished for PR 20350 at commit fcd665e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -49,7 +49,6 @@ select 1e35 / 0.1;

-- arithmetic operations causing a precision loss are truncated
select 123456789123456789.1234567890 * 1.123456789123456789;
select 0.001 / 9876543210987654321098765432109876543.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is missing a ; before...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, unfortunately I missed it somehow previously...

@SparkQA
Copy link

SparkQA commented Jan 23, 2018

Test build #86524 has finished for PR 20350 at commit 610a595.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.doc("When true (default), if an overflow on a decimal occurs, then NULL is returned. " +
"Spark's older versions and Hive behave in this way. If turned to false, SQL ANSI 2011 " +
"specification, will be followed instead: an arithmetic exception is thrown. This is " +
"what most of the SQL databases do.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit:

If turned to false, SQL ANSI 2011 specification, will be followed instead

This should be

If turned to false, SQL ANSI 2011 specification will be followed instead

@SparkQA
Copy link

SparkQA commented Jan 23, 2018

Test build #86528 has finished for PR 20350 at commit c73471d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • final class Decimal extends Ordered[Decimal] with Serializable

@SparkQA
Copy link

SparkQA commented Jan 23, 2018

Test build #86533 has finished for PR 20350 at commit 2c8e2c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

kindly ping @gatorsmile @cloud-fan

@gatorsmile
Copy link
Member

Thanks for your contributions! Could you ping us again after 2.3 release?

@mgaido91
Copy link
Contributor Author

sure, thanks @gatorsmile

@SparkQA
Copy link

SparkQA commented Feb 7, 2018

Test build #87154 has finished for PR 20350 at commit bd8b645.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Feb 28, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87794 has finished for PR 20350 at commit bd8b645.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Feb 28, 2018

the error is unrelated, and I am seeing it frequently throughout the code. It seems something caused the flakiness to increase for this test. There is already a ticket for it: SPARK-23369, but it is becoming more and more important to fix it. It would be great also to check what increased the flakiness...

@kiszk
Copy link
Member

kiszk commented Mar 1, 2018

retest this please

@mgaido91
Copy link
Contributor Author

mgaido91 commented Mar 1, 2018

sorry @gatorsmile, now that RC for 2.3 has passed the vote, do you happen to have time to look at this? Thanks.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87841 has finished for PR 20350 at commit bd8b645.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Sure, will do the review in the next few days.

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92087 has finished for PR 20350 at commit 069b861.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93072 has finished for PR 20350 at commit 069b861.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

My understanding from #21499 (comment) is that the plan you have in mind is to have this in 3.0 and not in 2.4 @gatorsmile , am I right? If this is the case, shall I close this now and reopen once 2.4 is out? Thanks.

@SparkQA
Copy link

SparkQA commented Jul 22, 2018

Test build #93401 has finished for PR 20350 at commit 069b861.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mickjermsurawong-stripe
Copy link
Contributor

hi @mgaido91! I'm hoping to have this feature in 3.0 too. Thank you for the work here :)
So +1 on this getting picked up and merged!

For now I'm trying to cherry-pick this to our local fork. I'd love your input here please: how do you think we would handle overflow problem when our dataset doesn't involve any arithmetic operation on Sql type? When it is just round-tripping between jvm BigDecimal to sql Decimal, we can still get back null.
I'm thinking that I would need a similar check at encoding/decoding (ScalaReflections that does deserializer)

Concretely, this PR would fix nicely throw exception instead of null when we have the option enabled.

    val result = spark
      .sql(s"select cast(${smallDecimalWithFullPrecision} as decimal(38, 38)) + 1")
      .first()
    result shouldEqual Row(null)

However, dataset round-tripping here will still return null

    val result: Seq[BigDecimal] = spark
      .createDataset(Seq(BigDecimal("123456789" * 4)))(ExpressionEncoder[BigDecimal])
      .map(identity(_))(ExpressionEncoder[BigDecimal])
      .collect()
    result shouldEqual Seq(null)

@mgaido91
Copy link
Contributor Author

mgaido91 commented Jun 21, 2019

@mickjermsurawong-stripe thanks for your comment. I am updating this PR resolving the conflicts and I hope that your feedback will help this PR moving forward.

As far as your question is regarded, you may consider adding an AssertNotNull to the output of the decoding, in order to get an exception in case that conversion fails. This is not really feasible - of course - if your input BigDecimal can contain null, ie. if it is an Option[BigDecimal] you want to get in SQL. Another option which may work for you is what is done in this PR now, after my update, in the RowEncoder. The check there may solve your issue too.

@SparkQA
Copy link

SparkQA commented Jun 21, 2019

Test build #106780 has finished for PR 20350 at commit 37f47ef.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2019

Test build #106782 has finished for PR 20350 at commit bc25c0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1441,6 +1441,16 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

val DECIMAL_OPERATIONS_NULL_ON_OVERFLOW =
buildConf("spark.sql.decimalOperations.nullOnOverflow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overflow can happen with non-decimal operations, do we need a new config?

cc @JoshRosen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this @cloud-fan !

Yes, that case (non-decimal) is handled in #21599. I'd say that, in the non-decimal case, the situation is pretty different. Indeed, overflow in decimal operation is handled by Spark now, converting overflow operations to null; while overflow in operation on non-decimal isn't handled at all currently.

In non-decimal operations, indeed we return a wrong value (the java way). So IMHO, the non-decimal case current behavior doesn't make any sense at all (considering this is SQL and not a low level language like Java/Scala) and keeping its current behavior makes no sense (we already discussed this in that PR actually).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DB does not have to follow the SQL standard completely in every corners. The current behavior in Spark is by design and I don't think that's nonsense.

I do agree that it's a valid requirement that some users want overflow to fail, but it should be protected by a config.

My question is if we need one config for overflow, or 2 configs for decimal and non-decimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DB does not have to follow the SQL standard completely in every corners. The current behavior in Spark is by design and I don't think that's nonsense.

I am sorry, but I don't really agree with you on this. I see the discussion is a bit OT, but I'd like just to explain the reasons of my opinion. SQL is a declarative language and here we are coupling the result/behavior to the specific execution language we are using. Spark is cross-language, but for arithmetic operations overflow works in a very peculiar way of the language we use which is:

  • against SQL standards and no other DB works differently from SQL standards w.r.t. this, so very surprising (at least) for SQL users;
  • different from what happens in Python and in R when you overflow in those languages (an Int becomes long and so on there);

So there in no Spark user other than Scala/Java ones who might understand the behavior Spark has in those cases. Sorry for being a bit OT, anyway.

My question is if we need one config for overflow, or 2 configs for decimal and non-decimal.

Yes, this is the main point here. IMHO, I'd prefer 2 configs because when the config is turned off, the behavior is completely different: in once case it returns null, in the other we return wrong results. But I see also the value in reducing as much as possible the number of configs, which is already pretty big. So I'd prefer 2 configs, but if you and the community thinks 1 it is better, I can update the PR in order to make this config more generic.

Thanks for your feedbacks and the discussion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think separate flags are okay. Here's why:

  • While eventually we probably want to add flaggable non-Decimal overflow detection (see [SPARK-26218][SQL] Overflow on arithmetic operations returns incorrect result #21599 (comment)), these PRs should land separately (to limit scope of changes / code review). If we give this PR's flag a generic name, merge this PR, and then somehow fail to merge the integer overflow PR in time for 3.0 then we'd be facing a situation where we'd need to change the behavior of a released flag if we later merge the non-Decimal overflow PR.
  • If we implement separate flags for each type of overflow then that doesn't preclude us from later introducing a single flag which is used as the default value for the per-type flags.

I'm interested in whichever option allows us to make incremental progress by getting this merged (even if flagged off by default) so that we can rely on this functionality being available in 3.x instead of having to maintain it indefinitely in our own fork (with all of the associated long-term maintenance and testing burdens).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One followup question regarding flag naming: is "overflow" the most precise term for the change made here? Or does this flag also change behavior in precision-loss scenarios? Maybe I'm getting tripped up on terminology here, since insufficient precision to represent small fractional quantities is essentially an "overflow" of the digit space reserved to represent the fractional part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments @JoshRosen.
Yes, this deals with the overflow case. The underflow (or precision loss) is handled in a different way and the behavior depends on another config (see SPARK-22036): it either avoids precision loss, causing eventually overflow (old behavior) or truncates (as defined by the SQL standard and following closely SQL server behavior from which we derived our decimal operations implementation). So this flag is related only to the overflow case.

@JoshRosen JoshRosen changed the title [SPARK-23179][SQL] Support option to throw exception if overflow occurs [SPARK-23179][SQL] Support option to throw exception if overflow occurs during Decimal arithmetic Jun 25, 2019
@mickjermsurawong-stripe
Copy link
Contributor

hi @mgaido91 I find two other places where we might want to add this check for consistent behavior

@mgaido91
Copy link
Contributor Author

@mickjermsurawong-stripe let me answer to your two comments separately:

At encoder level (thanks you for the addition on RowEncoder in this PR). We may similarly need it for consistency here at SerializerBuildHelper

Not really, when we serialize a SQL decimal to a Java/Scala BigDecimal, we cannot have overflow. So this patch/case doesn't apply (as you can see, there is not even a CheckOverflow there).

When we use agg(sum) the aggregation

Thanks for pointing this out! Actually this is a weird operator, because it does throw an exception on overflow in interpreted mode, while in codegen it doesn't. Moreover, it throw a IllegalArgumentException, instead of the ArithmeticException Since this is a weird behavior of the operator itself, I'd prefer having a PR targeting that specific operator and addressing its behavior totally, since it need to be revisited carefully. Are there concerns on this plan?

@cloud-fan
Copy link
Contributor

I'd prefer having a PR targeting that specific operator and addressing its behavior totally

SGTM

I'm merging this PR, thanks!

@cloud-fan cloud-fan closed this in 3139d64 Jun 27, 2019
@mgaido91
Copy link
Contributor Author

thanks @cloud-fan and thank you all for the reviews!

@mickjermsurawong-stripe
Copy link
Contributor

@mgaido91,

when we serialize a SQL decimal to a Java/Scala BigDecimal, we cannot have overflow

I think we should catch overflowing when encoding Java/Scala BigDecimal to SQL decimal (not the other way round), and that happens at the SerializerBuildHelper.
If there's no check at serialization, the decimal will still preserve java/scala bigdecimal until at UnsafeRowWriter where null can be silently introduced there. The proposal is to catch this earlier at the encoding part.

To the current test structure in ExpressionEncoderSuite, this would fail with NPE on the round-tripped results

encodeDecodeTest(BigDecimal("9" * 21), "overflowing big decimal")

To pinpoint at the encoder part, the new test here shows null row for decimal type.

  test("big decimal exceeding precision serialized to row") {
    val overflowing = BigDecimal("9" * 21)
    val encoder = ExpressionEncoder[BigDecimal]
    val row = encoder.toRow(overflowing)
    assert(row.get(0, DecimalType.SYSTEM_DEFAULT) === null)
  }

I can make a separate PR on this if this sounds good to you.

@JoshRosen
Copy link
Contributor

JoshRosen commented Jun 27, 2019

Actually this is a weird operator, because it does throw an exception on overflow in interpreted mode, while in codegen it doesn't. Moreover, it throw a IllegalArgumentException, instead of the ArithmeticException Since this is a weird behavior of the operator itself, I'd prefer having a PR targeting that specific operator and addressing its behavior totally, since it need to be revisited carefully. Are there concerns on this plan?

+1; it sounds like the pre-existing difference between the codegen and interpreted paths is a separate, pre-existing bug. It's also especially hard to reason about because (AFAIK) the paths in DecimalAggregates are only used for certain sizes of decimals, so the behavioral inconsistency can be triggered by a combination of precision/scale and codegen/interpreted (which is really confusing!).

For consistency, I think we should:

  1. Ensure that the agg(sum) codegen and interpreted paths behave the same w.r.t nullOnOverflow == true (the default / 2.x behavior).
  2. Respect the nullOnOverflow flag in agg(sum) codegen.

Let's make a followup JIRA for this change and a separate JIRA for the encoder changes @mickjermsurawong-stripe discussed in his comment (I can loop back later this morning or afternoon to help file these).

Edit: in @mickjermsurawong-stripe's PR we can improve test coverage for both sets of encoders (RowEncoder and ExpressionEncoder / ScalaReflection), since AFAIK we don't have a dedicated unit tests for overflow detection in Encoder overflow (this PR did improve things, but I don't think it's directly tested; if that change is covered here then I think it's done a bit indirectly via another test).

@mgaido91
Copy link
Contributor Author

I see now, sorry for misunderstanding @mickjermsurawong-stripe. I think it is fine to go ahead with your PR. I created https://issues.apache.org/jira/browse/SPARK-28200 for it. So please go ahead submitting your PR for that JIRA.

I created also https://issues.apache.org/jira/browse/SPARK-28201 for the MakeDecimal case. I'll work on it ASAP.

Thanks!

pull bot pushed a commit to Pandinosaurus/spark that referenced this pull request Jul 1, 2019
## What changes were proposed in this pull request?

In SPARK-23179, it has been introduced a flag to control the behavior in case of overflow on decimals. The behavior is: returning `null` when `spark.sql.decimalOperations.nullOnOverflow` (default and traditional Spark behavior); throwing an `ArithmeticException` if that conf is false (according to SQL standards, other DBs behavior).

`MakeDecimal` so far had an ambiguous behavior. In case of codegen mode, it returned `null` as the other operators, but in interpreted mode, it was throwing an `IllegalArgumentException`.

The PR aligns `MakeDecimal`'s behavior with the one of other operators as defined in SPARK-23179. So now both modes return `null` or throw `ArithmeticException` according to `spark.sql.decimalOperations.nullOnOverflow`'s value.

Credits for this PR to mickjermsurawong-stripe who pointed out the wrong behavior in apache#20350.

## How was this patch tested?

improved UTs

Closes apache#25010 from mgaido91/SPARK-28201.

Authored-by: Marco Gaido <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
mapr-devops pushed a commit to mapr/spark that referenced this pull request Jul 5, 2019
## What changes were proposed in this pull request?

- Currently, `ExpressionEncoder` does not handle bigdecimal overflow. Round-tripping overflowing java/scala BigDecimal/BigInteger returns null.
  - The serializer encode java/scala BigDecimal to to sql Decimal, which still has the underlying data to the former.
  - When writing out to UnsafeRow, `changePrecision` will be false and row has null value.
https://github.com/apache/spark/blob/24e1e41648de58d3437e008b187b84828830e238/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java#L202-L206
- In [SPARK-23179](apache#20350), an option to throw exception on decimal overflow was introduced.
- This PR adds the option in `ExpressionEncoder` to throw when detecting overflowing BigDecimal/BigInteger before its corresponding Decimal gets written to Row. This gives a consistent behavior between decimal arithmetic on sql expression (DecimalPrecision), and getting decimal from dataframe (RowEncoder)

Thanks to mgaido91 for the very first PR `SPARK-23179` and follow-up discussion on this change.
Thanks to JoshRosen for working with me on this.

## How was this patch tested?

added unit tests

Closes apache#25016 from mickjermsurawong-stripe/SPARK-28200.

Authored-by: Mick Jermsurawong <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.