Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394

ConeyLiu · 2023-08-25T10:00:14Z

For rewrite procedure, we should throw better exceptions when the filter condition can't be pushed down or can't convert to Iceberg. For example:

scala> spark.sql("call local.system.rewrite_data_files(table => 'db.test_rewrite', where => 'substr(data, 2) = \"fo\"')").show()
java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:529)
  at scala.None$.get(Option.scala:527)
  at org.apache.spark.sql.execution.datasources.SparkExpressionConverter$.convertToIcebergExpression(SparkExpressionConverter.scala:38)
  at org.apache.spark.sql.execution.datasources.SparkExpressionConverter.convertToIcebergExpression(SparkExpressionConverter.scala)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.checkAndApplyFilter(RewriteDataFilesProcedure.java:137)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.lambda$call$0(RewriteDataFilesProcedure.java:123)
  at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
  at org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:88)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.call(RewriteDataFilesProcedure.java:103)

After this PR:

scala> spark.sql("call local.system.rewrite_data_files(table => 'db.test_rewrite', where => 'substr(data, 2) = \"fo\"')").show()
java.lang.IllegalArgumentException: Cannot convert Spark filter: (data IS NOT NULL) AND ((SUBSTRING(data, 2)) = 'fo') to Iceberg expression
  at org.apache.spark.sql.execution.datasources.SparkExpressionConverter$.convertToIcebergExpression(SparkExpressionConverter.scala:43)
  at org.apache.spark.sql.execution.datasources.SparkExpressionConverter.convertToIcebergExpression(SparkExpressionConverter.scala)
  at org.apache.iceberg.spark.procedures.BaseProcedure.filterExpression(BaseProcedure.java:171)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.checkAndApplyFilter(RewriteDataFilesProcedure.java:129)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.lambda$call$0(RewriteDataFilesProcedure.java:118)
  at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
  at org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:88)
  at org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.call(RewriteDataFilesProcedure.java:100)

ConeyLiu · 2023-08-25T10:03:38Z

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala

@@ -35,7 +35,14 @@ object SparkExpressionConverter {
    // Currently, it is a double conversion as we are converting Spark expression to Spark filter
    // and then converting Spark filter to Iceberg expression.
    // But these two conversions already exist and well tested. So, we are going with this approach.
-    SparkFilters.convert(DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true).get)
+    DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true) match {


We should change this to V2 translator and V2 filter. Then we could convert the system functions to Iceberg expression after #8088 or after apache/spark#42612.

thank you and I think it's much better than null.get when I saw it in my change

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java

dramaticlly · 2023-09-14T21:04:11Z

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala

@@ -35,7 +35,14 @@ object SparkExpressionConverter {
    // Currently, it is a double conversion as we are converting Spark expression to Spark filter
    // and then converting Spark filter to Iceberg expression.
    // But these two conversions already exist and well tested. So, we are going with this approach.
-    SparkFilters.convert(DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true).get)
+    DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true) match {


thank you and I think it's much better than null.get when I saw it in my change

ConeyLiu · 2023-09-19T03:24:47Z

cc @dramaticlly @RussellSpitzer @nastra @advancedxy code has rebased, please take a look when you are free.

dramaticlly

thank you @ConeyLiu , LGTM!

dramaticlly · 2023-09-19T04:36:58Z

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java

@@ -828,6 +828,26 @@ public void testDefaultSortOrder() {
    assertEquals("Data after compaction should not change", expectedRecords, actualRecords);
  }

+  @Test
+  public void testRewriteWithUntranslatedOrUnconvertedFilter() {


nit: this can potentially be part of existing test testRewriteDataFilesWithInvalidInputs but I think it's also fine to leave it here.

nastra

LGTM, I think we should also apply this against Spark 3.5 now

advancedxy

LGTM, except a minor wording comment.

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala

ConeyLiu · 2023-09-19T11:13:37Z

I think we should also apply this against Spark 3.5 now

@nastra should we do it with a follow-up PR to port the changes to other spark versions? Because it exists in all other spark versions.

nastra · 2023-09-19T14:08:59Z

Follow-up PR is fine IMO (whatever you prefer)

nastra · 2023-09-19T14:11:24Z

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala

+    DataSourceV2Strategy.translateFilterV2(sparkExpression) match {
+      case Some(filter) =>
+        val converted = SparkV2Filters.convert(filter)
+        assert(converted != null, s"Cannot convert Spark filter: $filter to Iceberg expression")


Sorry I missed this. Is it normal to use assert in Scala? I'd rather prefer throwing IAE here

looks like we had used them quite often in scala code: https://github.com/search?q=repo%3Aapache%2Ficeberg%20%20assert(&type=code

assert is common usage in Spark code. Anyway, changed it to IllegalArgumentException to keep the same behavior as Cannot translate Spark expression.

ConeyLiu · 2023-09-20T10:26:08Z

Thanks @nastra for merging this. And thanks @dramaticlly @advancedxy for the review. I will submit a backport.

github-actions bot added the spark label Aug 25, 2023

ConeyLiu commented Aug 25, 2023

View reviewed changes

ConeyLiu mentioned this pull request Sep 14, 2023

Spark 3.4: Push down system functions by V2 filters for rewriting DataFiles and PositionDeleteFiles #8560

Merged

dramaticlly reviewed Sep 14, 2023

View reviewed changes

ConeyLiu and others added 2 commits September 19, 2023 10:42

better error message

f20d52a

rebase & update ut

83c1939

ConeyLiu force-pushed the rewite-transform branch from 62d223f to 83c1939 Compare September 19, 2023 03:11

dramaticlly approved these changes Sep 19, 2023

View reviewed changes

nastra approved these changes Sep 19, 2023

View reviewed changes

advancedxy approved these changes Sep 19, 2023

View reviewed changes

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala Outdated Show resolved Hide resolved

...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala Outdated Show resolved Hide resolved

address comments

f271173

nastra reviewed Sep 19, 2023

View reviewed changes

advancedxy approved these changes Sep 20, 2023

View reviewed changes

address comments

844cb8b

nastra approved these changes Sep 20, 2023

View reviewed changes

nastra merged commit 09a5dbc into apache:master Sep 20, 2023
37 checks passed

ConeyLiu mentioned this pull request Sep 21, 2023

Spark 3.1, 3.2, 3.3, 3.5: Throw better exception when filter expression cannot be translated in Rewrite procedure #8605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394

Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394

ConeyLiu commented Aug 25, 2023 •

edited

Loading

ConeyLiu Aug 25, 2023 •

edited

Loading

dramaticlly Sep 14, 2023

dramaticlly Sep 14, 2023

ConeyLiu commented Sep 19, 2023

dramaticlly left a comment

dramaticlly Sep 19, 2023

nastra left a comment

advancedxy left a comment

ConeyLiu commented Sep 19, 2023

nastra commented Sep 19, 2023

nastra Sep 19, 2023

dramaticlly Sep 19, 2023

ConeyLiu Sep 20, 2023

ConeyLiu commented Sep 20, 2023

Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394

Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394

Conversation

ConeyLiu commented Aug 25, 2023 • edited Loading

ConeyLiu Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

dramaticlly Sep 14, 2023

Choose a reason for hiding this comment

dramaticlly Sep 14, 2023

Choose a reason for hiding this comment

ConeyLiu commented Sep 19, 2023

dramaticlly left a comment

Choose a reason for hiding this comment

dramaticlly Sep 19, 2023

Choose a reason for hiding this comment

nastra left a comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

ConeyLiu commented Sep 19, 2023

nastra commented Sep 19, 2023

nastra Sep 19, 2023

Choose a reason for hiding this comment

dramaticlly Sep 19, 2023

Choose a reason for hiding this comment

ConeyLiu Sep 20, 2023

Choose a reason for hiding this comment

ConeyLiu commented Sep 20, 2023

ConeyLiu commented Aug 25, 2023 •

edited

Loading

ConeyLiu Aug 25, 2023 •

edited

Loading