forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-38333: [C++][FS][Azure] Implement file writes (apache#38780)
### Rationale for this change Writing files is an important part of the filesystem ### What changes are included in this PR? Implements `OpenOutputStream` and `OpenAppendStream` for Azure. - Initially I started with the implementation from apache#12914 but I made quite a few changes: - Removed the different code path for hierarchical namespace accounts. There should not be any performance advantage to using special APIs only available on hierachical namespace accounts. - Only implement `ObjectAppendStream`, not `ObjectOutputStream`. `OpenOutputStream` is implemented by truncating the existing file then returning a `ObjectAppendStream`. - More precise use of `try` `catch`. Every call to Azure is wrapped in a `try` `catch` and should return a descriptive error status. - Avoid unnecessary calls to Azure. For example we now maintain the block list in memory and commit it only once on flush. apache#12914 committed the block list after each block that was staged and on flush queried Azure to get the list of uncommitted blocks. The new approach is consistent with the Azure fsspec implementation https://github.com/fsspec/adlfs/blob/092685f102c5cd215550d10e8347e5bce0e2b93d/adlfs/spec.py#L2009 - Adjust the block_ids slightly to minimise the risk of them conflicting with blocks written by other blob storage clients. - Implement metadata writes. Includes adding default metadata to `AzureOptions`. - Tests are based on the `gscfs_test.cc` but I added a couple of extra. - Handle the TODO(apacheGH-38780) comments for using the Azure fs to write data in tests ### Are these changes tested? Yes. Everything should be covered by azurite tests ### Are there any user-facing changes? Yes. The Azure filesystem now supports file writes. * Closes: apache#38333 Lead-authored-by: Thomas Newton <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
- Loading branch information
1 parent
5a0e8b6
commit c1b12ca
Showing
3 changed files
with
429 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.