Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete bundle content stored in Azure/GCS when deleting a bundle #4308

Merged
merged 6 commits into from
Dec 5, 2022

Conversation

wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Nov 9, 2022

Reasons for making this change

We need to delete the actual content when deleting the bundle, if the content is stored on cloud storage.

Test it using Azure and GCS. After user call cl rm, the remote content is deleted.

Related issues

#3965

Screenshots

Checklist

  • I've added a screenshot of the changes, if this is a frontend change
  • I've added and/or updated tests, if this is a backend change
  • I've run the pre-commit.sh script
  • I've updated docs, if needed

@wwwjn wwwjn changed the title Delete bundle contents in Azure/GCS Delete bundle content stored in Azure/GCS when deleting a bundle Nov 9, 2022
@wwwjn
Copy link
Contributor Author

wwwjn commented Nov 9, 2022

TODO: Test delete speed. create a new PR to make delete async.

if bundle_location.startswith(
StorageURLScheme.AZURE_BLOB_STORAGE.value
) or bundle_location.startswith(StorageURLScheme.GCS_STORAGE.value):
FileSystems.delete([file_location])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure this works with both GCS and Azure, since I know in the upload case they had to be dealt with differently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure this works with both GCS and Azure, since I know in the upload case they had to be dealt with differently

Yes it works for both Azure and GCS. Because this part of code is running on Codalab server, while bypass upload happens at the client side. At user client side, we need handle Azure and GCS differently.

@AndrewJGaut
Copy link
Contributor

TODO: Test delete speed. create a new PR to make delete async.

Yes, we should benchmark this. I need to do the same for the changes I'm trying to make to speed up cl rm anyways, so I can help with that.

if default_bundle_store['storage_type'] in (StorageType.AZURE_BLOB_STORAGE.value,):
if default_bundle_store['storage_type'] in (
StorageType.AZURE_BLOB_STORAGE.value,
StorageType.GCS_STORAGE.value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this change do? Does it fix a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this change do? Does it fix a bug?

Yeah this is a bug fixing. This if branch here is to handle when we set CODALAB_DEFAULT_BUNDLE_STORE_NAME is pointing to a cloud bundle store (Azure / GCS). However, I forgot to add GCS when get PR #4119 merged. But we never set GCS as a default storage before, so it does not cause an actual error.

if bundle_location.startswith(
StorageURLScheme.AZURE_BLOB_STORAGE.value
) or bundle_location.startswith(StorageURLScheme.GCS_STORAGE.value):
FileSystems.delete([file_location])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to add a test? particularly for more complex code like file_location = '/'.join(bundle_location.split('/')[0:-1]) + "/"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is to get the folder path that continues content.gz and index.sqlite

@wwwjn wwwjn merged commit 10f06bb into master Dec 5, 2022
@wwwjn wwwjn deleted the delete-bundle branch December 5, 2022 18:47
@AndrewJGaut AndrewJGaut mentioned this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants