-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete bundle content stored in Azure/GCS when deleting a bundle #4308
Conversation
TODO: Test delete speed. create a new PR to make delete async. |
if bundle_location.startswith( | ||
StorageURLScheme.AZURE_BLOB_STORAGE.value | ||
) or bundle_location.startswith(StorageURLScheme.GCS_STORAGE.value): | ||
FileSystems.delete([file_location]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure this works with both GCS and Azure, since I know in the upload case they had to be dealt with differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure this works with both GCS and Azure, since I know in the upload case they had to be dealt with differently
Yes it works for both Azure and GCS. Because this part of code is running on Codalab server, while bypass upload happens at the client side. At user client side, we need handle Azure and GCS differently.
Yes, we should benchmark this. I need to do the same for the changes I'm trying to make to speed up cl rm anyways, so I can help with that. |
if default_bundle_store['storage_type'] in (StorageType.AZURE_BLOB_STORAGE.value,): | ||
if default_bundle_store['storage_type'] in ( | ||
StorageType.AZURE_BLOB_STORAGE.value, | ||
StorageType.GCS_STORAGE.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this change do? Does it fix a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this change do? Does it fix a bug?
Yeah this is a bug fixing. This if
branch here is to handle when we set CODALAB_DEFAULT_BUNDLE_STORE_NAME
is pointing to a cloud bundle store (Azure / GCS). However, I forgot to add GCS when get PR #4119 merged. But we never set GCS as a default storage before, so it does not cause an actual error.
if bundle_location.startswith( | ||
StorageURLScheme.AZURE_BLOB_STORAGE.value | ||
) or bundle_location.startswith(StorageURLScheme.GCS_STORAGE.value): | ||
FileSystems.delete([file_location]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to add a test? particularly for more complex code like file_location = '/'.join(bundle_location.split('/')[0:-1]) + "/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is to get the folder path that continues content.gz and index.sqlite
Reasons for making this change
We need to delete the actual content when deleting the bundle, if the content is stored on cloud storage.
Test it using Azure and GCS. After user call
cl rm
, the remote content is deleted.Related issues
#3965
Screenshots
Checklist