Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete also blobs ending in slash when deleting directory contents #21145

Merged

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Mar 18, 2024

Description

AWS S3 allows creating "folder" blobs with the media type "application/x-directory". These blobs should be deleted as well along with the normal blobs when deleting the contents of a directory in order to ensure that the directory corresponding to a table is actually fully deleted.

Additional context and related issues

Use solution inspired from #13974

Fixes #21111

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# S3 File System
* Delete also blobs ending in slash when deleting directory contents. ({issue}`issuenumber`)

@ebyhr
Copy link
Member

ebyhr commented Mar 18, 2024

/test-with-secrets sha=723929b0c851fc76607639506c5b5309a32c5f95

Copy link

github-actions bot commented Mar 18, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8334770270

@findinpath
Copy link
Contributor Author

@findinpath findinpath force-pushed the findinpath/s3-delete-directory-objects branch 2 times, most recently from 2be7961 to cca6b2d Compare March 19, 2024 09:44
@findinpath findinpath self-assigned this Mar 19, 2024
@anusudarsan
Copy link
Member

@findinpath can you also confirm if #21135 (comment) is green with this fix?

@findinpath
Copy link
Contributor Author

tests               | 2024-03-19 20:49:19 INFO: [8 of 1] io.trino.tests.product.deltalake.TestDeltaLakeDropTableCompatibility.testDropTable [DELTA, DELTA, false] (Groups: profile_specific_tests, delta-lake-databricks, delta-lake-oss)
presto-master       | 2024-03-19T20:49:21.192+0545	INFO	dispatcher-query-5	io.trino.event.QueryMonitor	TIMELINE: Query 20240319_150420_00019_3xrdv :: FINISHED :: elapsed 433ms :: planning 0ms :: waiting 0ms :: scheduling 433ms :: running 0ms :: finishing 433ms :: begin 2024-03-19T20:49:20.759+05:45 :: end 2024-03-19T20:49:21.192+05:45
tests               | 2024-03-19 20:49:28 INFO: SUCCESS     /    io.trino.tests.product.deltalake.TestDeltaLakeDropTableCompatibility.testDropTable [DELTA, DELTA, false] (Groups: profile_specific_tests, delta-lake-databricks, delta-lake-oss) took 8.8 seconds
tests               | 2024-03-19 20:49:28 INFO: 
tests               | 2024-03-19 20:49:28 INFO: Completed 8 tests
tests               | 2024-03-19 20:49:28 INFO: 8 SUCCEEDED      /      0 FAILED      /      0 SKIPPED
tests               | 2024-03-19 20:49:28 INFO: Tests execution took 1 minutes and 18 seconds
tests               | 2024-03-19 20:49:28 INFO: ManageTestResources.onFinish: running checks
tests               | 
tests               | ===============================================
tests               | tempto-tests
tests               | Total tests run: 8, Failures: 0, Skips: 0
tests               | ===============================================
tests               | 
....
➜  trino git:(findinpath/s3-delete-directory-objects) git log --oneline 

bf9b8a0023 (HEAD -> findinpath/s3-delete-directory-objects) Use native S3 filesystem for Databricks tests
cca6b2d138 (findinpath/findinpath/s3-delete-directory-objects) Delete also blobs ending in slash when deleting directory contents
e23c4165f8 (trino/master, trino/HEAD, master) Remove unused methods

can you also confirm if #21135 (comment) is green with this fix?

Yes, I can. The current solution works as expected also for Databricks tests.
Do note that there was no need to do apply any .acl(...) on PutObjectRequest

@findinpath findinpath force-pushed the findinpath/s3-delete-directory-objects branch from cca6b2d to 0b5b6fa Compare March 19, 2024 20:11
@findinpath
Copy link
Contributor Author

Unrelated CI failurept (default, suite-tpch, )
https://github.com/trinodb/trino/actions/runs/8349481254/job/22854092598#step:4:32
Unable to download artifact(s): Unable to download and extract artifact: Artifact download failed after 5 retries.

AWS S3 allows creating "folder" blobs with the media type
"application/x-directory". These blobs should be deleted as well
along with the normal blobs when deleting the contents of a directory
in order to ensure that the directory corresponding to a table is
actually fully deleted.
@findinpath findinpath force-pushed the findinpath/s3-delete-directory-objects branch from 0b5b6fa to 5fda039 Compare March 20, 2024 06:04
@findepi
Copy link
Member

findepi commented Mar 20, 2024

/test-with-secrets sha=5fda039a57f5e106d3c0990f93da389a1f4c30cd

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8357686550

@wendigo wendigo merged commit f7db654 into trinodb:master Mar 21, 2024
62 checks passed
@github-actions github-actions bot added this to the 443 milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Delta lake connector leaves behind directory when dropping managed tables created by Databricks
7 participants