Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not retrieve Blob metadata on batch delete #23650

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Oct 2, 2024

This should make batch delete way faster as it won't resolve blob metadata.

Description

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 2, 2024
Copy link
Member

@elonazoulay elonazoulay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

getBlob(storage, new GcsLocation(location))
.ifPresent(blob -> batch.delete(blob.getBlobId()));
GcsLocation gcsLocation = new GcsLocation(location);
batch.delete(BlobId.of(gcsLocation.bucket(), gcsLocation.path()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this does not throw if the blob does not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't throw:

Calling StorageBatchResult.get() on the return value yields true upon successful deletion, false if the blob was not found, or throws a StorageException if the operation failed.

@wendigo wendigo merged commit f7a25a1 into master Oct 2, 2024
69 checks passed
@wendigo wendigo deleted the serafin/gcs-delete-blob-id branch October 2, 2024 21:11
@github-actions github-actions bot added this to the 460 milestone Oct 2, 2024
@mosabua mosabua mentioned this pull request Oct 2, 2024
1 task
@@ -169,8 +169,8 @@ public void deleteFiles(Collection<Location> locations)
for (List<Location> locationBatch : partition(locations, batchSize)) {
StorageBatch batch = storage.batch();
for (Location location : locationBatch) {
getBlob(storage, new GcsLocation(location))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, did you spot it by chance or was there a delete files in bulk slow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants