-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.4: Rewrite procedure throw better exception when filter expression cannot translate #8394
Conversation
@@ -35,7 +35,14 @@ object SparkExpressionConverter { | |||
// Currently, it is a double conversion as we are converting Spark expression to Spark filter | |||
// and then converting Spark filter to Iceberg expression. | |||
// But these two conversions already exist and well tested. So, we are going with this approach. | |||
SparkFilters.convert(DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true).get) | |||
DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change this to V2 translator and V2 filter. Then we could convert the system functions to Iceberg expression after #8088 or after apache/spark#42612.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you and I think it's much better than null.get
when I saw it in my change
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
@@ -35,7 +35,14 @@ object SparkExpressionConverter { | |||
// Currently, it is a double conversion as we are converting Spark expression to Spark filter | |||
// and then converting Spark filter to Iceberg expression. | |||
// But these two conversions already exist and well tested. So, we are going with this approach. | |||
SparkFilters.convert(DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true).get) | |||
DataSourceStrategy.translateFilter(sparkExpression, supportNestedPredicatePushdown = true) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you and I think it's much better than null.get
when I saw it in my change
62d223f
to
83c1939
Compare
cc @dramaticlly @RussellSpitzer @nastra @advancedxy code has rebased, please take a look when you are free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @ConeyLiu , LGTM!
@@ -828,6 +828,26 @@ public void testDefaultSortOrder() { | |||
assertEquals("Data after compaction should not change", expectedRecords, actualRecords); | |||
} | |||
|
|||
@Test | |||
public void testRewriteWithUntranslatedOrUnconvertedFilter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can potentially be part of existing test testRewriteDataFilesWithInvalidInputs
but I think it's also fine to leave it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we should also apply this against Spark 3.5 now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except a minor wording comment.
...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala
Outdated
Show resolved
Hide resolved
...ark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala
Outdated
Show resolved
Hide resolved
@nastra should we do it with a follow-up PR to port the changes to other spark versions? Because it exists in all other spark versions. |
Follow-up PR is fine IMO (whatever you prefer) |
DataSourceV2Strategy.translateFilterV2(sparkExpression) match { | ||
case Some(filter) => | ||
val converted = SparkV2Filters.convert(filter) | ||
assert(converted != null, s"Cannot convert Spark filter: $filter to Iceberg expression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this. Is it normal to use assert
in Scala? I'd rather prefer throwing IAE here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we had used them quite often in scala code: https://github.com/search?q=repo%3Aapache%2Ficeberg%20%20assert(&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
is common usage in Spark code. Anyway, changed it to IllegalArgumentException
to keep the same behavior as Cannot translate Spark expression
.
Thanks @nastra for merging this. And thanks @dramaticlly @advancedxy for the review. I will submit a backport. |
For rewrite procedure, we should throw better exceptions when the filter condition can't be pushed down or can't convert to Iceberg. For example:
After this PR: