-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40037: [C++][FS][Azure] Make attempted reads and writes against directories fail fast #40119
Merged
felipecrv
merged 20 commits into
apache:main
from
Tom-Newton:tomnewton/check_for_directory_marker_metadata/GH-40037
Feb 21, 2024
Merged
Changes from 17 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
b305c14
Add MetadataIndicatesIsDirectory function
Tom-Newton af47546
Return not a file if trying to read or write a directory
Tom-Newton e0d5d2b
Add tests
Tom-Newton 67a5e20
Avoid some unhandled exceptions in destructor
Tom-Newton 5388f67
Add test DisallowCreatingFileAndDirectoryWithTheSameName
Tom-Newton 25dff2f
Add test cases for DisallowReadingOrWritingDirectoryMarkers with trai…
Tom-Newton bfdd719
Add comments explaining the strategy with ObjectInputFile
Tom-Newton ff72cdf
Ensure not a directory on OpenOutputStream and OpenAppendStream
Tom-Newton 8f573cb
More complete tests
Tom-Newton f71cc50
Tidy
Tom-Newton 37862c2
Add an assertion that metadata can be written without modifying the file
Tom-Newton 28e9bdd
Avoid errors in destructor more neatly
Tom-Newton 26d6e96
Small update to WriteMetadata test
Tom-Newton ab02113
Auto-format
Tom-Newton b905f5d
Move arguments to ObjectAppendStream Init instead of constructor
Tom-Newton aec7b34
Add a test case for outputstream when container does not exist
Tom-Newton be33953
Avoid unnecessary API call if kContainerNotFound
Tom-Newton 5ddd70c
Fix lint
Tom-Newton 8ac3c95
Tidy
Tom-Newton 111abbb
Lint again
Tom-Newton File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -826,6 +826,41 @@ class TestAzureFileSystem : public ::testing::Test { | |
AssertFileInfo(fs(), subdir3, FileType::Directory); | ||
} | ||
|
||
void TestDisallowReadingOrWritingDirectoryMarkers() { | ||
auto data = SetUpPreexistingData(); | ||
auto directory_path = data.Path("directory"); | ||
|
||
ASSERT_OK(fs()->CreateDir(directory_path)); | ||
ASSERT_RAISES(IOError, fs()->OpenInputFile(directory_path)); | ||
ASSERT_RAISES(IOError, fs()->OpenOutputStream(directory_path)); | ||
ASSERT_RAISES(IOError, fs()->OpenAppendStream(directory_path)); | ||
|
||
auto directory_path_with_slash = directory_path + "/"; | ||
ASSERT_RAISES(IOError, fs()->OpenInputFile(directory_path_with_slash)); | ||
ASSERT_RAISES(IOError, fs()->OpenOutputStream(directory_path_with_slash)); | ||
ASSERT_RAISES(IOError, fs()->OpenAppendStream(directory_path_with_slash)); | ||
} | ||
|
||
void TestDisallowCreatingFileAndDirectoryWithTheSameName() { | ||
auto data = SetUpPreexistingData(); | ||
auto path1 = data.Path("directory1"); | ||
ASSERT_OK(fs()->CreateDir(path1)); | ||
ASSERT_RAISES(IOError, fs()->OpenOutputStream(path1)); | ||
ASSERT_RAISES(IOError, fs()->OpenAppendStream(path1)); | ||
AssertFileInfo(fs(), path1, FileType::Directory); | ||
|
||
auto path2 = data.Path("directory2"); | ||
ASSERT_OK(fs()->OpenOutputStream(path2)); | ||
// CreateDir returns OK even if there is already a file or directory at this | ||
// location. Whether or not this is the desired behaviour is debatable. | ||
ASSERT_OK(fs()->CreateDir(path2)); | ||
AssertFileInfo(fs(), path2, FileType::File); | ||
} | ||
|
||
void TestOpenOutputStreamWithMissingContainer() { | ||
ASSERT_RAISES(IOError, fs()->OpenOutputStream("not-a-container/file", {})); | ||
} | ||
|
||
void TestDeleteDirSuccessEmpty() { | ||
if (HasSubmitBatchBug()) { | ||
GTEST_SKIP() << kSubmitBatchBugMessage; | ||
|
@@ -1559,6 +1594,20 @@ TYPED_TEST(TestAzureFileSystemOnAllScenarios, CreateDirOnMissingContainer) { | |
this->TestCreateDirOnMissingContainer(); | ||
} | ||
|
||
TYPED_TEST(TestAzureFileSystemOnAllScenarios, DisallowReadingOrWritingDirectoryMarkers) { | ||
this->TestDisallowReadingOrWritingDirectoryMarkers(); | ||
} | ||
|
||
TYPED_TEST(TestAzureFileSystemOnAllScenarios, | ||
DisallowCreatingFileAndDirectoryWithTheSameName) { | ||
this->TestDisallowCreatingFileAndDirectoryWithTheSameName(); | ||
} | ||
|
||
TYPED_TEST(TestAzureFileSystemOnAllScenarios, | ||
OpenOutputStreamWithMissingContainer) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is causing the linter to fail @Tom-Newton. Please fix and I will merge. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed |
||
this->TestOpenOutputStreamWithMissingContainer(); | ||
} | ||
|
||
TYPED_TEST(TestAzureFileSystemOnAllScenarios, DeleteDirSuccessEmpty) { | ||
this->TestDeleteDirSuccessEmpty(); | ||
} | ||
|
@@ -2162,6 +2211,18 @@ TEST_F(TestAzuriteFileSystem, WriteMetadata) { | |
.Value.Metadata; | ||
// Defaults are overwritten and not merged. | ||
EXPECT_EQ(Core::CaseInsensitiveMap{std::make_pair("bar", "foo")}, blob_metadata); | ||
|
||
// Metadata can be written without writing any data. | ||
ASSERT_OK_AND_ASSIGN( | ||
output, fs_with_defaults->OpenAppendStream( | ||
full_path, /*metadata=*/arrow::key_value_metadata({{"bar", "baz"}}))); | ||
ASSERT_OK(output->Close()); | ||
blob_metadata = blob_service_client_->GetBlobContainerClient(data.container_name) | ||
.GetBlockBlobClient(blob_path) | ||
.GetProperties() | ||
.Value.Metadata; | ||
// Defaults are overwritten and not merged. | ||
EXPECT_EQ(Core::CaseInsensitiveMap{std::make_pair("bar", "baz")}, blob_metadata); | ||
} | ||
|
||
TEST_F(TestAzuriteFileSystem, OpenOutputStreamSmall) { | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can inject
AzureFileSystem *azure_file_system
here and not have to allocate a closure for this. You would callAzureFileSystem::Impl::EnsureNotFlatNamespaceDirectory(location)
viaazure_file_system->impl_
(accessible because the handles produced by the azure file system can be friends with the filesystem class).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extra info. I was planning to do this but I was struggling with the friends thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick is to forward-declare the class as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my main problem was that
ObjectAppendStream
is defined inside an anonymous namespace but I still haven't got it working as you describe.Are you suggesting to use
AzureFileSystem *azure_file_system
orAzureFileSystem:Impl *azure_file_system
as the argument toObjectAppendStream::Impl
. I don't know how I can get aAzureFileSystem
pointer from insideAzureFileSystem::Impl
and usingAzureFileSystem::Impl
as the argument leads toincomplete type
errors which I don't think I can avoid.Also if you wouldn't mind I would be interested to know what the disadvantage of a lambda function is compared to what you proposed.
Sorry about my lacking C++ knowledge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create the
std::function
, you heap allocate an object with copies of the values in the capture list and generate a lot more extra code in the binary:When you think about an
std::function
this way (a pair of context data and a function), you realize the class you already serves that purpose.But hey, this is becoming challenging, so I won't hold the PR anymore because of this. Moving to
Init()
was a big step in the right direction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining