Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud storage rewrite with GCloud? #1960

Closed
hohaichi opened this issue Jul 6, 2016 · 27 comments
Closed

Cloud storage rewrite with GCloud? #1960

hohaichi opened this issue Jul 6, 2016 · 27 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@hohaichi
Copy link

hohaichi commented Jul 6, 2016

Does GCloud support Google Cloud Storage rewrite?

@daspecster
Copy link
Contributor

I don't believe we do at this time.

@hohaichi
Copy link
Author

hohaichi commented Jul 6, 2016

Do you have any plan to add rewrite? It is hard for the user to rotate their encryption key without rewrite.

@daspecster
Copy link
Contributor

That's a great use case!

@tseaver & @dhermes WDYT? I can start this today or tomorrow if there are not blockers.

@daspecster daspecster added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: storage Issues related to the Cloud Storage API. labels Jul 6, 2016
@theacodes
Copy link
Contributor

theacodes commented Jul 14, 2016

@tseaver @dhermes @jgeewax this is also a blocking issue for us to be able to re-write storage usage samples using this library.

@tseaver
Copy link
Contributor

tseaver commented Jul 14, 2016

Based on the rewrite API docs, it seems like this should be a method on Bucket, parallel to Bucket.copy_blob. Something like:

    def rewrite_blob(self, blob, destination_bucket, new_name, client=None):
        """Rewrite the given blob to the given bucket, with a new name.

        :type blob: :class:`gcloud.storage.blob.Blob`
        :param blob: The blob to be copied.

        :type destination_bucket: :class:`gcloud.storage.bucket.Bucket`
        :param destination_bucket: The bucket into which the blob should be
                                   copied.

        :type new_name: string
        :param new_name: (optional) the new name for the copied file.

        :type client: :class:`gcloud.storage.client.Client` or ``NoneType``
        :param client: Optional. The client to use.  If not passed, falls back
                       to the ``client`` stored on the current bucket.

        :rtype: bytes
        :returns: The new Blob.
        """
        client = self._require_client(client)
        if new_name is None:
            new_name = blob.name
        new_blob = Blob(bucket=destination_bucket, name=new_name)
        api_path = blob.path + '/rewriteTo' + new_blob.path
        resource = dict(blob._properties)
        result = client.connection.api_request(
            method='POST', path=api_path, data=resource,
            _target_object=new_blob)
        token = result.get('rewriteToken')
        while token not in (None, ''):
            query_params = {'rewriteToken': token}
            result = client.connection.api_request(
                method='POST', path=api_path, query_params=query_params,
                _target_object=new_blob)
            token = result.get('rewriteToken')
        new_blob._set_properties(result['resource'])
        return new_blob

@theacodes
Copy link
Contributor

theacodes commented Jul 14, 2016

That would be fine, alternatively blob.rewrite_to(destination_name, source_encryption_key=..., encryption_key=...) would be acceptable as well.

N.B. this method must take two encryption key parameters, one for the source and one for the destination. This a the primary use case of the rewrite method, other than just moving objects between buckets.

@tseaver
Copy link
Contributor

tseaver commented Jul 15, 2016

@jonparrott Where in the rewrite request are the keys passed? I see fields in the Object resource which are metadata about the encryption, but not the key itself.

@theacodes
Copy link
Contributor

theacodes commented Jul 15, 2016

In the headers, hilariously enough.

See the current sample here.

@tseaver
Copy link
Contributor

tseaver commented Jul 15, 2016

@jonparrott Why are those values not part of the payload? Worse, why is that not documented?

@theacodes
Copy link
Contributor

@tseaver I have no idea - this API predates me joining and I'm not part of the API review team. My hunch is that it was a way to get around releasing a new API version. Maybe @Capstan can provide some background.

@tseaver
Copy link
Contributor

tseaver commented Jul 15, 2016

/me grumbles This is exactly the kind of thing that is supposed to make you release a new API version.

@theacodes
Copy link
Contributor

@tseaver don't shoot the messenger. I thought it was whack when I had to write the samples for it using google-api-python-client, because it's awkward to add headers using that library.

@tseaver
Copy link
Contributor

tseaver commented Jul 15, 2016

@jonparrott I'm grumbling about that (hypothetical) motive, which predates you. New API release or not, the API docs should still be updated to define the (added?) behaviors: how did you learn about those headers in the first place?

@theacodes
Copy link
Contributor

@Capstan
Copy link

Capstan commented Aug 10, 2016

@tseaver I've filed an internal bug about Rewrite docs not saying enough about how to use it with customer-supplied encryption keys. I'm not entirely sure why it couldn't have been added to the post body, but I do know that ensuring that no key is ever persisted anywhere is of paramount importance and may have guided that decision.

@theacodes
Copy link
Contributor

Bump, any update on this?

@theacodes theacodes reopened this Sep 13, 2016
@daspecster
Copy link
Contributor

It looks like it was last updated August 19th,
And now it talks about the headers. I'm not sure if that was there before or if that's what we were looking for?

https://cloud.google.com/storage/docs/encryption#request

@hohaichi
Copy link
Author

Re: object resource having encryption key metadata but not key materials: It is a little bit confusing in the first glance, as object resources can appear in both requests and responses. But if you look more closely, the table denotes which fields are "writable". Only those fields can be updated. Key metadata cannot be updated, they are only helpful in responses.

Re: why the keys are passed in as headers rather than the post body: Actually adding the keys to the request body is not as straightforward as it seems. In our current API, a request body either contains object data or object metadata. And key materials are neither.

@theacodes
Copy link
Contributor

@tseaver @dhermes can I get a release of the storage client so I can update the sample?

@tseaver
Copy link
Contributor

tseaver commented Oct 28, 2016

@dhermes I'm not quite sure how you've made sub-project releases after the umbrella-0.20.0 tag. I just ran $ git log umbrella-0.20.0..HEAD -- storage/ and got the following:

@dhermes
Copy link
Contributor

dhermes commented Oct 28, 2016

@tseaver I think it's more robust just to use the GitHub UI (since people can merge PRs however they like now, and we haven't disabled the other two options, though we can):

https://github.com/GoogleCloudPlatform/google-cloud-python/commits/master/storage?since=2016-09-27

@tseaver
Copy link
Contributor

tseaver commented Oct 28, 2016

@dhermes FWIW, I wasn't grepping for "Merged" commits: I looked at the whole history for /storage since that tag.

@dhermes
Copy link
Contributor

dhermes commented Oct 28, 2016

Haha DERP. I should've read closely.

@theacodes
Copy link
Contributor

Bump - I can has release?

@dhermes
Copy link
Contributor

dhermes commented Oct 31, 2016

@jonparrott It's not as easy as it seems. It requires a core release as well, which has implications on all packages.

@theacodes
Copy link
Contributor

The core change being the iterator work? If so, I can wait.

@dhermes
Copy link
Contributor

dhermes commented Oct 31, 2016

The core change being the iterator work?

Yes

richkadel pushed a commit to richkadel/google-cloud-python that referenced this issue May 6, 2017
Avoids the need to plumb it through all the 'upload' and 'download'
methods.

Convert '_set_encryption_headers(key, headers)' into a pure function,
'_get_encryption_headers(key)', returning a new dict.

Preparing for use of encryption in to-be-added 'Blob.rewrite(source_blob)'
method.  See: googleapis#1960.
richkadel pushed a commit to richkadel/google-cloud-python that referenced this issue May 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

6 participants