Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

ludlows · 2023-02-07T03:10:41Z

Feature Request / Improvement

Improvement

background

we have known that the rewriteDataFiles is suggested to run periodically.
in our production, we would like to run rewriteDataFiles for a iceberg table once a month using spark sql procedure rewrite_data_files.

for convenience, we add the following sql command in each ETL daily job.
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where => "load_date > '$LASTMONTH' and load_date <'$CURRENTMONTH' and substr('$TODAY', 7,2) = '03'" )
for instance, when $TODAY = '20230208', then where condition is always false. so we expected that rewrite_data_files can exit directly.

in other words, we got exceptions by executing the sql:
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where =>" '01'='03' ")
It is an AnalysisException in scala code below since the option object filtered by where condition is empty.

iceberg/spark/v3.3/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala

Line 47 in 32a8ef5

}.getOrElse(throw new AnalysisException("Failed to find filter expression"))

Our Request

so could it be possible make rewrite_data_files exit directly without exceptions if the where condtion is a deterministic false?

Query engine

Spark

The text was updated successfully, but these errors were encountered:

ludlows mentioned this issue Feb 7, 2023

Spark 3.3, 3.4: use a deterministic where condition to make rewrite_data_files… #6760

Merged

szehon-ho closed this as completed in #6760 May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

ludlows commented Feb 7, 2023 •

edited

Loading

Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

Comments

ludlows commented Feb 7, 2023 • edited Loading

Feature Request / Improvement

Improvement

background

Our Request

Query engine

ludlows commented Feb 7, 2023 •

edited

Loading