Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform S3 directory deletion with batch requests #13974

Merged
merged 6 commits into from
Dec 9, 2022

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Sep 2, 2022

Description

Speed up the deletion of an S3 "directory" ( a path prefix which corresponds to multiple S3 objects) by using batch deletion requests.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Hive, Delta, Iceberg (Lakehouse connectors)

How would you describe this change to a non-technical end user or system administrator?

Speed up the deletion of an S3 "directory" ( a path prefix which corresponds to multiple S3 objects) by using batch deletion requests.

Related issues, pull requests, and links

Fixes #13017

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Delta, Hive, Iceberg
* Ensure the rename/delete effectiveness for S3 directories which appear shallowly like objects

@pettyjamesm
Copy link
Member

I'm not sure this logic totally makes sense, because an S3 object with Content-Type: "application/octet-stream", with 0 bytes and a key that doesn't end in / (eg: s3://bucket/key) seems like it should be interpreted as an "empty file" and not a directory. I could see the argument for interpreting an S3 object s3://bucket/key/ as a directory, or s3://bucket/key as a "directory" even if an no object with that key exists so long as an "child object" exists (eg: some object with key s3://bucket/key/object)- but when an S3 object actually exists without the trailing slash I think we have to interpret that as a "file" and not directory unless it's appropriately marked with the right content type.

Now with that said, it's important to remember that S3 is not a file system, it's an object store so it doesn't really have directories. So there might be a better option for how to handle delete(Path path, boolean recursive == true). In that case, instead of doing this which recursively generates S3 listing, S3 getObjectMetadata, and S3 deleteObject calls:

for (FileStatus file : listStatus(path)) {
  delete(file.getPath(), true);
}

You could instead do something like (simplified to ignore details):

Iterator<S3ObjectSummary> listings = S3Objects.withPrefix(s3, bucketName, keyFromPath(path) + "/").iterator();
Iterator<String> keys = Iterators.transform(listings, S3ObjectSummary::getKey);
Iterator<List<String>> keyBatches = Iterators.partition(keys, 1000);
while (keyBatches.hasNext()) {
  String[] keysInBatch = keyBatches.next().toArray(String[]::new);
  // TODO: handle MultiObjectDeleteException in case some deletes fail
  s3.deleteObjects(new DeleteObjectsRequest(bucketName).withKeys(keysInBatch));
}

Which will be much more efficient, faster, and completely remove all S3 objects with actual keys starting with the prefix "s3://<bucket>/<path>/".

@findinpath findinpath changed the title Ensure the rename/delete effectiveness for S3 directories which appear shallowly like objects Perform S3 directory deletion with batch requests Sep 5, 2022
@findinpath
Copy link
Contributor Author

@pettyjamesm I have addressed your comment.

As you mentioned offline there is definitely room for improvement in io.trino.plugin.hive.s3.TrinoS3FileSystem#rename . I would suggest addressing the refactoring of this method in a separate PR.

Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added review comments, I think we probably want to reorganize the logic to check recursive explicitly instead of handling that by falling through, since that's now a much more significant factor in the implementation logic. Recursive deletes for "implied directories" (ie: key does not exist but is a prefix for keys that do) and for "actual directories" (key exists, but has content-type DIRECTORY_MEDIA_TYPE) can proceed, but "empty objects" must be considered files and should not allow recursive deletes.

}

// check if this path is a directory
Iterator<LocatedFileStatus> iterator = listPath(path, OptionalInt.of(1), ListingMode.SHALLOW_ALL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check will succeed for non-directory objects, since listing a prefix for any key that exists will return the object at that key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that listPath uses path + / as prefix of the listing and not path.

@findinpath
Copy link
Contributor Author

I think we probably want to reorganize the logic to check recursive explicitly

I agree. The current state of the code for the delete method is rather hard to follow. I'll do a subsequent refactoring.

@pettyjamesm I've added a set of tests against AWS S3 object storage in order to be able to test the assumptions from the code.

@electrum
Copy link
Member

electrum commented Sep 7, 2022

We’re planning to replace TrinoS3FileSystem with a native TrinoFileSystem implementation, which has an API designed for object stores. Iceberg and Delta Lake already use it. Maybe instead of improving cod that we’ll throw away, we should build the new implementation now?

@findinpath
Copy link
Contributor Author

@electrum I understand the direction of replacing TrinoS3FileSystem with a native TrinoFileSystem implementation.
However, my change is rather punctual (related to the delete method) and starting the endeavour of making a native TrinoFileSystem implementation would take much more time.
Even if the bugfix is short to medium lived, I still see a benefit in landing this new approach.

@findepi
Copy link
Member

findepi commented Nov 30, 2022

/test-with-secrets sha=12265ea06d6ac9e006bb56973cb23cc54a8e9d2d

@findepi
Copy link
Member

findepi commented Dec 1, 2022

(rebased to fix test-with-secrets job)

@findepi
Copy link
Member

findepi commented Dec 1, 2022

/test-with-secrets sha=07e689f46a2a40fe6e618b02a0eae4a2ff492b76

@findepi
Copy link
Member

findepi commented Dec 1, 2022

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3591234020

Delete all the objects which are found under the specified path
prefix in batches in order to improve the performance of
the operation.

This operation is customised for the needs of Trino and does
intentionally not intend to add unnecessary complexity
for dealing with the various quirks of s3 compatible object
storage systems that are not relevant for the use cases of Trino.
@findepi
Copy link
Member

findepi commented Dec 5, 2022

@findinpath what were the changes in the last two force-pushes (#13974 (comment), #13974 (comment))?

@findinpath
Copy link
Contributor Author

I have changed the setup() and tearDown() methods from TestTrinoS3FileSystemMinio so that there are no resources allocated in the constructor.

@findepi
Copy link
Member

findepi commented Dec 5, 2022

/test-with-secrets sha=ba2d0b67d0890174483733fe48012373c7478ead

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3623643550

@findepi
Copy link
Member

findepi commented Dec 9, 2022

/test-with-secrets sha=62a44129acd1d4db0e8630062e3c01901bac0882

@findinpath
Copy link
Contributor Author

CI Hit #13199

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3657194620

@findepi
Copy link
Member

findepi commented Dec 9, 2022

@nineinchnick do you understand the Pull Request Labeler / pt with secrets (pull_request_target) failure (https://github.com/trinodb/trino/pull/13974/checks?check_run_id=9998437275)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Delta Lake connector doesn't remove content of MANAGED_TABLE Glue tables created by Databricks
8 participants