-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightning: support opening empty files and skip directory objects in S3 #33029
Conversation
* Also refined the S3 integration test
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/cc @gozssky @glorv |
* Refined the test for introducing some empty files.
Since the bug will be triggered when:
If either situation is avoided, the problem will be fixed. Previous solutions tackled on avoiding the situation 1. That is, to avoid calling Now I decide to change the fixing solution: to avoid the happening on situation 2. That is, making every objects calling There are two ways to do this. The first way is to change the implementation of Open() for S3. If opening an empty file path, returns a In the definition of tidb/br/pkg/storage/storage.go Lines 84 to 88 in 824f434
It indicates that all the paths iterated in the hook function of This will make the main logic for restoring and making regions intact and clean. However, that leaves all Maybe the filtering logic for empty files can also be added in the hook function in WalkDir() calling in |
* Revert the makeParquetFileRegion logic * For S3 WalkDir(), filter out empty files
/run-unit-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsdashun WalkDir
may not be used only by Lightning. To be consistent with normal file system, I think we shouldn't skip all empty objects in WalkDir
.
For opening an empty object in S3, we can use S3API.HeadObject
to get the size information first.
Hi @gozssky , I think calling S3API.HeadObject in the I still think changing the behavior in |
@dsdashun To avoid unnecessary overhead, perhaps we can add a special check for the
Agree. |
* For `WalkDir()`, only the empty directories are omitted. * For `open()`, if the request is a full range request, skip passing Range argument, and use response.ContentLength as the object size information
Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/5e3381b54df919fde983989ca71cba96da1333ee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
/run-integration-br-test |
1 similar comment
/run-integration-br-test |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: eda2f4e
|
@dsdashun: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: close #31824
Problem Summary: When an empty directory object is matched during parquet import, an error will be reported
What is changed and how it works?
Open()
in S3 now supports opening empty files, so that the behavior is the same as in local file system.Seek()
operation in S3 added a real offset check, so that negative values are not supportedCheck List
Tests
Side effects
Documentation
Release note