Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder #25016

Conversation

mickjermsurawong-stripe
Copy link
Contributor

@mickjermsurawong-stripe mickjermsurawong-stripe commented Jun 30, 2019

What changes were proposed in this pull request?

  • Currently, ExpressionEncoder does not handle bigdecimal overflow. Round-tripping overflowing java/scala BigDecimal/BigInteger returns null.
  • In SPARK-23179, an option to throw exception on decimal overflow was introduced.
  • This PR adds the option in ExpressionEncoder to throw when detecting overflowing BigDecimal/BigInteger before its corresponding Decimal gets written to Row. This gives a consistent behavior between decimal arithmetic on sql expression (DecimalPrecision), and getting decimal from dataframe (RowEncoder)

Thanks to @mgaido91 for the very first PR SPARK-23179 and follow-up discussion on this change.
Thanks to @JoshRosen for working with me on this.

How was this patch tested?

added unit tests

@JoshRosen JoshRosen changed the title [SPARK-28200] Decimal overflow handling [SPARK-28200] Decimal overflow handling in ExpressionEncoder Jun 30, 2019
@JoshRosen
Copy link
Contributor

JoshRosen commented Jun 30, 2019

@mickjermsurawong-stripe, I think we also had a test case for RowEncoder? Could you also submit that change (in RowEncoderSuite.scala) as part of this PR? Even though we're not actually changing RowEncoder behavior here, the old code was lacking explicit test coverage for that path, so it'd be good to pull in that change to strengthen the tests

@JoshRosen
Copy link
Contributor

jenkins this is ok to test

@JoshRosen JoshRosen changed the title [SPARK-28200] Decimal overflow handling in ExpressionEncoder [SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder Jun 30, 2019
@JoshRosen JoshRosen added the SQL label Jun 30, 2019
testOverflowingBigNumeric(new BigInteger("9" * 100), "java very big int")

private def testOverflowingBigNumeric[T: TypeTag](bigDecimal: T, testName: String): Unit = {
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seq("true", "false").foreach

@SparkQA
Copy link

SparkQA commented Jun 30, 2019

Test build #107048 has finished for PR 25016 at commit b3740ba.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2019

Test build #107052 has finished for PR 25016 at commit 0ed62b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

encodeDecodeTest(new java.math.BigDecimal(("9" * 20) + "." + "9" * 18),
"java decimal within precision/scale limit")

encodeDecodeTest(BigDecimal(("9" * 20) + "." + "9" * 18).unary_-,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use -BigDecimal... instead of .unary_-

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, just some style comments on the tests. Thanks.

Anyway, may you please also update the description of the PR? Because actually, the point here is not that we are not honoring the newly introduced flag, but previously we were not checking/handling at all the overflow in these situations. So I think we best should make this more clear in the PR description. Thanks.

}
}

withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> true.toString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> true.toString) {
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> "true") {

}

private def testDecimalOverflow(schema: StructType, row: Row): Unit = {
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> false.toString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> false.toString) {
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> "false") {

@@ -162,6 +162,32 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest {
assert(row.toSeq(schema).head == decimal)
}

test("RowEncoder should respect nullOnOverflow for decimals") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may you please also add the Jira number to the test case? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. thank you Marco!

@SparkQA
Copy link

SparkQA commented Jul 1, 2019

Test build #107062 has finished for PR 25016 at commit 2513ea7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2019

Test build #107081 has finished for PR 25016 at commit 87073da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 683e270 Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants