Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cast(9.95 as decimal(3,1)), actual: 9.9, expected: 10.0 #10809

Closed
Tracked by #10771
gerashegalov opened this issue May 13, 2024 · 11 comments · Fixed by #10917
Closed
Tracked by #10771

[BUG] cast(9.95 as decimal(3,1)), actual: 9.9, expected: 10.0 #10809

gerashegalov opened this issue May 13, 2024 · 11 comments · Fixed by #10917
Assignees
Labels
bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

Repro

 ~/dist/spark-3.3.0-bin-hadoop3/bin/spark-shell  \
   --conf spark.plugins=com.nvidia.spark.SQLPlugin \
   --conf spark.rapids.sql.test.enabled=true \
   --conf spark.rapids.sql.explain=ALL 
   --jars dist/target/rapids-4-spark_2.12-24.06.0-SNAPSHOT-cuda11.jar 
scala> val df = Seq(9.95).toDF.coalesce(1)
df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: double]

scala> spark.conf.set("spark.rapids.sql.enabled", true)

scala> df.selectExpr("cast(value as decimal(3,1))").collect()
24/05/13 11:34:42 WARN GpuOverrides: 
*Exec <ProjectExec> will run on GPU
  *Expression <Alias> cast(value#1 as decimal(3,1)) AS value#5 will run on GPU
    *Expression <Cast> cast(value#1 as decimal(3,1)) will run on GPU
  *Exec <CoalesceExec> will run on GPU
    ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
      @Expression <AttributeReference> value#1 could run on GPU

res2: Array[org.apache.spark.sql.Row] = Array([9.9])

scala> spark.conf.set("spark.rapids.sql.enabled", false)

scala> df.selectExpr("cast(value as decimal(3,1))").collect()
res4: Array[org.apache.spark.sql.Row] = Array([10.0])

Related to #9682

@gerashegalov gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 13, 2024
@mattahrens
Copy link
Collaborator

mattahrens commented May 21, 2024

Scope: make casting of floats to decimals feature flag off by default and update documentation accordingly with this example. Ref: spark.rapids.sql.castFloatToDecimal.enabled. Check if supported operators in tools needs to be updated.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 21, 2024
@ttnghia
Copy link
Collaborator

ttnghia commented May 24, 2024

Filed a cudf issue: rapidsai/cudf#15862

@gerashegalov
Copy link
Collaborator Author

gerashegalov commented May 24, 2024

Also take a look at the discussion around #9682 (comment) and above. 9.95 is not representable as double

$ jshell 
|  Welcome to JShell -- Version 21.0.2
|  For an introduction type: /help intro

jshell> new BigDecimal(9.95)
$8 ==> 9.949999999999999289457264239899814128875732421875

jshell> new BigDecimal(9.949999999999999289457264239899814128875732421875)
$9 ==> 9.949999999999999289457264239899814128875732421875

jshell> new BigDecimal(9.95).setScale(1, BigDecimal.ROUND_HALF_UP)
$11 ==> 9.9

jshell> new BigDecimal(String.valueOf(9.95)).setScale(1, BigDecimal.ROUND_HALF_UP)
$1 ==> 10.0

jshell> new BigDecimal("9.95").setScale(1, BigDecimal.ROUND_HALF_UP)
$12 ==> 10.0

@ttnghia
Copy link
Collaborator

ttnghia commented May 24, 2024

Okay after reading through the issue #9682 then I realize that this issue is also one instance of it.

@thirtiseven
Copy link
Collaborator

thirtiseven commented May 25, 2024

I tried the float => string => decimal path (it is very easy to implement in plugin), it can pass the Spark UT, but still some differences from the known limits of ryu float to string and a very edge cases in string to decimal #10890. I will post a pr and share some results for review next week, but not sure if the diffs are acceptable or original way can match it better.

@ttnghia
Copy link
Collaborator

ttnghia commented May 25, 2024

That sounds good. I've also found a way to implement in C++ which is very efficient but not sure if it will pass integration test. I'll verify that and will post a PR too.

If having a chance, please list the related tests that I can run to verify.

@ttnghia
Copy link
Collaborator

ttnghia commented May 28, 2024

@thirtiseven Please test #10917 to see if you have any test fails. On my local env, all the unit tests and integration tests passed but I'm not sure if I missed anything.

@GaryShen2008
Copy link
Collaborator

Hi @ttnghia, rapidsai/cudf#15905 has been merged, is it ready to fix this issue by using this new way?

@ttnghia
Copy link
Collaborator

ttnghia commented Jul 15, 2024

No that is not what we want. We may need something from pmattione-nvidia/cudf#2 which will be added into NVIDIA/spark-rapids-jni#2078, which is what we actually need. I will work on it soon this week.

@GaryShen2008
Copy link
Collaborator

GaryShen2008 commented Jul 29, 2024

Hi @ttnghia, can we close this issue because your PR has fixed it?

@ttnghia
Copy link
Collaborator

ttnghia commented Jul 29, 2024

Thanks @GaryShen2008. The fix in JNI will be picked by spark-rapids in #10917 thus this issue will be closed by that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants