Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: use where condition to control the execution of rewriteDataFiles for procedure rewrite data files in spark sql #6759

Closed
ludlows opened this issue Feb 7, 2023 · 0 comments · Fixed by #6760

Comments

@ludlows
Copy link
Contributor

ludlows commented Feb 7, 2023

Feature Request / Improvement

Improvement

background

we have known that the rewriteDataFiles is suggested to run periodically.
in our production, we would like to run rewriteDataFiles for a iceberg table once a month using spark sql procedure rewrite_data_files.

for convenience, we add the following sql command in each ETL daily job.
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where => "load_date > '$LASTMONTH' and load_date <'$CURRENTMONTH' and substr('$TODAY', 7,2) = '03'" )
for instance, when $TODAY = '20230208', then where condition is always false. so we expected that rewrite_data_files can exit directly.

in other words, we got exceptions by executing the sql:
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where =>" '01'='03' ")
It is an AnalysisException in scala code below since the option object filtered by where condition is empty.

}.getOrElse(throw new AnalysisException("Failed to find filter expression"))

Our Request

so could it be possible make rewrite_data_files exit directly without exceptions if the where condtion is a deterministic false?

Query engine

Spark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant