Skip to content

Commit

Permalink
apacheGH-43097: [C++] Implement PathFromUri support for Azure file …
Browse files Browse the repository at this point in the history
…system (apache#43098)

### Rationale for this change

See apache#43097.

### What changes are included in this PR?
Implements `AzureFS::PathFromUri` using existing URI parsing and path extraction inside the `AzureOptions`.

### Are these changes tested?
Yes, added a unit test.

### Are there any user-facing changes?
No, but calling `PathFromUri` will now work instead of throwing due to no implementation provided.
* GitHub Issue: apache#43097

Authored-by: Oliver Layer <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
OliLay authored Aug 14, 2024
1 parent fc80d7d commit 88e8140
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 0 deletions.
27 changes: 27 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3199,4 +3199,31 @@ Result<std::shared_ptr<io::OutputStream>> AzureFileSystem::OpenAppendStream(
return impl_->OpenAppendStream(location, metadata, false, this);
}

Result<std::string> AzureFileSystem::PathFromUri(const std::string& uri_string) const {
/// We can not use `internal::PathFromUriHelper` here because for Azure we have to
/// support different URI schemes where the authority is handled differently.
/// Example (both should yield the same path `container/some/path`):
/// - (1) abfss://storageacc.blob.core.windows.net/container/some/path
/// - (2) abfss://acc:pw@container/some/path
/// The authority handling is different with these two URIs. (1) requires no prepending
/// of the authority to the path, while (2) requires to preprend the authority to the
/// path.
std::string path;
Uri uri;
RETURN_NOT_OK(uri.Parse(uri_string));
RETURN_NOT_OK(AzureOptions::FromUri(uri, &path));

std::vector<std::string> supported_schemes = {"abfs", "abfss"};
const auto scheme = uri.scheme();
if (std::find(supported_schemes.begin(), supported_schemes.end(), scheme) ==
supported_schemes.end()) {
std::string expected_schemes =
::arrow::internal::JoinStrings(supported_schemes, ", ");
return Status::Invalid("The filesystem expected a URI with one of the schemes (",
expected_schemes, ") but received ", uri_string);
}

return path;
}

} // namespace arrow::fs
2 changes: 2 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,8 @@ class ARROW_EXPORT AzureFileSystem : public FileSystem {
Result<std::shared_ptr<io::OutputStream>> OpenAppendStream(
const std::string& path,
const std::shared_ptr<const KeyValueMetadata>& metadata) override;

Result<std::string> PathFromUri(const std::string& uri_string) const override;
};

} // namespace arrow::fs
9 changes: 9 additions & 0 deletions cpp/src/arrow/filesystem/azurefs_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2958,5 +2958,14 @@ TEST_F(TestAzuriteFileSystem, OpenInputFileClosed) {
ASSERT_RAISES(Invalid, stream->ReadAt(1, 1));
ASSERT_RAISES(Invalid, stream->Seek(2));
}

TEST_F(TestAzuriteFileSystem, PathFromUri) {
ASSERT_EQ(
"container/some/path",
fs()->PathFromUri("abfss://storageacc.blob.core.windows.net/container/some/path"));
ASSERT_EQ("container/some/path",
fs()->PathFromUri("abfss://acc:pw@container/some/path"));
ASSERT_RAISES(Invalid, fs()->PathFromUri("http://acc:pw@container/some/path"));
}
} // namespace fs
} // namespace arrow

0 comments on commit 88e8140

Please sign in to comment.