Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_case_when fails with DATAGEN_SEED=1698940723 #9685

Closed
abellina opened this issue Nov 13, 2023 · 0 comments · Fixed by #9852
Closed

[BUG] test_case_when fails with DATAGEN_SEED=1698940723 #9685

abellina opened this issue Nov 13, 2023 · 0 comments · Fixed by #9852
Assignees
Labels
bug Something isn't working

Comments

@abellina
Copy link
Collaborator

abellina commented Nov 13, 2023

Relates to #9684

Just like the issue linked, this test also fails with duplicate key, and setting mapKeyDedupPolicy=LAST_WIN works.

Local repro:

SPARK_RAPIDS_TEST_DATAGEN_SEED=1698940723 ./run_pyspark_from_build.sh -k test_case_when\ and\ Map\ and\ double\ and\ not_null
[2023-11-02T17:07:56.115Z] ^[[31mFAILED^[[0m ../../src/main/python/conditionals_test.py::^[[1mtest_case_when[Map(Double(not_null),Double)][DATAGEN_SEED=1698940723, INJECT_OOM]^[[0m - py4j.protocol.Py4JJavaError: An error occurred while calling o34590.collectToPython.
[2023-11-02T17:07:56.115Z] : java.lang.RuntimeException: Duplicate map key NaN was found, please check the input data. If you want to remove the duplicated keys, you can set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last takes precedence.
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.util.ArrayBasedMapBuilder.put(ArrayBasedMapBuilder.scala:72)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.expressions.CreateMap.eval(complexTypeCreator.scala:229)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:66)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:54)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:415)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDown$1(QueryPlan.scala:94)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
[2023-11-02T17:07:56.115Z]  at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
[2023-11-02T17:07:56.115Z]  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
[2023-11-02T17:07:56.115Z]  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2023-11-02T17:07:56.115Z]  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 13, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023
@thirtiseven thirtiseven self-assigned this Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants