-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Upload] Handle both Files and InputStreams #2727
Comments
Hi, A large reason TransferManager only accepts When an Transferring objects between buckets is something Storage Transfer Service has been purpose built to perform in a managed performant manner. A GCS bucket can be both a source and sink. An example of how you might transition all objects to nearline storage class should give you an idea of how to get started https://cloud.google.com/storage-transfer/docs/create-transfers#client-libraries then click the |
That make sense. Thanks @BenWhitehead for the info (Sorry for the late reply, I was away)
We want to decompress some gzip/zip files in |
If STS can't do it, the most reliable way (both the reader and writer have transparent retries under the hood) to make it happen would be to do the following: StorageOptions options = StorageOptions.newBuilder().build();
try (Storage s = options.getService()) {
BlobId from = BlobId.of("<some-bucket-1>", "<some-object-1>");
BlobId to = BlobId.of("<some-bucket-2>", from.getName());
try (ReadChannel r = s.reader(from,
// pass the option to ensure the contents are gunzipped
BlobSourceOption.shouldReturnRawInputStream(false)
);
WriteChannel w = s.writer(BlobInfo.newBuilder(to).build(), BlobWriteOption.doesNotExist())) {
// disable buffering in the read channel
r.setChunkSize(0);
// set to to something smaller if you want to reduce the amount of buffering
// w.setChunkSize(16 * 1024 * 1024);
ByteStreams.copy(r, w);
}
} I've tested the above code and can attest to it working. My source object was a 1.5MiB gzip'ed text file, when unzipped and copied it was expanded to 512MiB without gzip. Since you mentioned memory footprint as being important I've made a couple tweaks to chunkSize to change from the defaults. Since you are going to be writing more bytes than you are reading, the write will end up being the slower of the two. The |
Thanks @BenWhitehead for the proposal. I will try it out and see the results. |
Upload
Description
We are using Google Cloud Storage to download (decompress) and upload those decompressed files again to Google Cloud Storage, the problem is that we are using InputStream to not overload the heap memory of the application. For that, we want to handle both cases for uploading files or input stream.
Solution
I drafted this PR#2728 as an example of what we need
Alternatives
Sticking to normal upload with Google Cloud Storage client
The text was updated successfully, but these errors were encountered: