Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Partition pruning support for file listing during Vacuum #1691

Closed
1 of 3 tasks
arunravimv opened this issue Apr 12, 2023 · 0 comments
Closed
1 of 3 tasks
Labels
enhancement New feature or request

Comments

@arunravimv
Copy link

arunravimv commented Apr 12, 2023

Feature request

Overview

We suggest that Vacuum supports partition filters so that users can control the amount of scan performed on object stores and it becomes incremental.

Motivation

Currently, users have large tables with daily/hourly partitions for many years, among all these partitions only recent ones are subjected to change due to job reruns, corrections and late arriving events.

When Vacuum is run on these tables, the listing is performed on all the partitions and it runs for several hours/days. This duration grows as tables grow and vacuum becomes a major overhead for customers especially when they have hundreds or thousands of such delta tables. File system scan takes the most amount of time in Vacuum operation for large tables, mostly due to the parallelism achievable and API throttling on the object stores.

Further details

We suggest an improvement where Vacuum supports WHERE clause with partition predicates just like Optimize command. Vacuum is limited to a subset of paths/directories matching the given partition predicate. It only limits the scan operations to those paths, it still compares with whole table metadata to identify the files to be deleted. Only filters involving partition key attributes are supported by this where clause.

VACUUM table_name [WHERE predicate] [RETAIN num HOURS] [DRY RUN]

Design Doc : https://docs.google.com/document/d/1vRTfMk3bRmCWLa-E4W-UaNOgFo_DyFCcCMVjB1GzrcU/edit?usp=sharing

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant