Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cloud client storage samples. #421

Merged
merged 3 commits into from
Jul 15, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 0 additions & 107 deletions storage/cloud-client/customer_supplied_keys.py

This file was deleted.

24 changes: 0 additions & 24 deletions storage/cloud-client/customer_supplied_keys_test.py

This file was deleted.

161 changes: 161 additions & 0 deletions storage/cloud-client/encryption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
#!/usr/bin/env python

# Copyright 2016 Google, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This application demonstrates how to upload and download encrypted blobs
(objects) in Google Cloud Storage.

Use `generate-encryption-key` to generate an example key:

python encryption.py generate-encryption-key

Then use the key to upload and download files encrypted with a custom key.

For more information, see the README.md under /storage and the documentation
at https://cloud.google.com/storage/docs/encryption.
"""

import argparse
import base64
import os

from gcloud import storage


def generate_encryption_key():
"""Generates a 256 bit (32 byte) AES encryption key and prints the
base64 representation.

This is included for demonstration purposes. You should generate your own
key. Please remember that encryption keys should be handled with a
comprehensive security policy.
"""
key = os.urandom(32)
encoded_key = base64.b64encode(key).decode('utf-8')
print('Base 64 encoded encryption key: {}'.format(encoded_key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning the value, rather than printing it?
I'm of two minds about it:

  • I think in a real-world scenario / reading this script directly, it makes more sense to return the value and have the calling function do the printing
  • As a code snippet, though, it's cleaner to just print the value - ie if you're just including this method as a snippet in the docs. In this case, however, I usually name the function with a print_ prefix to make this behavior clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonparrott remove the space in 'utf-8'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytests' capsys does a good job of making the return or not return not much of an issue for tests.

it makes more sense to return the value and have the calling function do the printing

There is no calling function, just the main clause. I'd prefer not to have the main clause do any printing.

if you're just including this method as a snippet in the docs.

All of these should be in the docs. :)

I think I'm going to say "Use your judgement on whether to return or print the value. If this function is the top of the stack, it likely makes sense to just print."



def upload_encrypted_blob(bucket_name, source_file_name,
destination_blob_name, base64_encryption_key):
"""Uploads a file to a Google Cloud Storage bucket using a custom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed this offline but...still think describing what the params are and their type may be valuable standard to set. Even if they are all strings, because it's good to document that and it's good standard to set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to strike a balance between simplicity and clarity. I think that the idea is that the name of the variable and its usage should imply its type well enough - I feel it's overt that all of these parameters should be strings. @pfritzsche what are your thoughts?

Overall, I want to avoid having a completely overbearing docstring here. I can be talked into them, but I tend towards less is more for samples. (For libraries, I'm the other way - super docstrings everywhere).

Also, keep in mind that the main section also provides us a chance to give help strings for command-line arguments. Although only the bucket name has a help string in these samples.

Copy link
Contributor

@pfritzsche pfritzsche Jul 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer self-documenting parameters, without explicit arg definitions in doc strings because 1) the doc strings become a hassle to maintain, and 2) they're often unnecessary and/or redundant. For example, I try to avoid things like:

def upload_blob(bucket_name):
    """Keyword arguments:
        bucket_name: the name of the bucket to upload to
    """

When you have more complex input params (objects, dictionaries with variable keys), etc., then I think it becomes more worth it.

tl;dr: In this case, I don't think it's necessary, but in some more complex cases I do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds like you two are leaning against it which is fine. To me, value of the doc string is 1) the type expected 2) setting the standard for all those cases where an extra sentence can help clarify thing. But don't feel strongly about it, just wanted to make sure we discussed/decided on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair argument. I'm not opposed to defining types in docstrings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 use your judgment :)

In the canonical template sample, though, we should definitely include a method where the arguments are described (including its type), so codify the format we want it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the canonical template sample, though, we should definitely include a method where the arguments are described (including its type), so codify the format we want it in.

For sure, I'll make sure that the sample guide shows an example of a more complex function that needs to be documented.

encryption key.

The file will be encrypted by Google Cloud Storage and only
retrievable using the provided encryption key.
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

# Encryption key must be an AES256 key represented as a bytestring with
# 32 bytes. Since it's passed in as a base64 encoded string, it needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our chat - I'm debating if the fact that you use base64 could throw people off. It's a side-effect of the way you produced the key, and not a requirement of the API.

Maybe update this comment as follows? "Since this sample application uses a base64 encoded string to store our encryption key, it needs..."

That way you can make it clear it's just a design choice, and not a functional requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's also a side-effect of the way this function is called (from the command-line), but let me think on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to do this using argparse - ie

upload_parser.add_argument('base64_encryption_key', type=base64.b64decode)

which is nice because it localizes the side-effect to the command-line interface, and this function only has to care about what it cares about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a bit magical. So ostensibly gcloud-python will support base64 keys soon, so I won't have to decode them anyway. But even without that I'd rather do the decoding in the function do it doesn't seem magical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's so cooooolll :)

# to be decoded.
encryption_key = base64.b64decode(base64_encryption_key)

blob.upload_from_filename(
source_file_name, encryption_key=encryption_key)

print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))


def download_encrypted_blob(bucket_name, source_blob_name,
destination_file_name, base64_encryption_key):
"""Downloads a previously-encrypted blob from Google Cloud Storage.

The encryption key provided must be the same key provided when uploading
the blob.
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)

# Encryption key must be an AES256 key represented as a bytestring with
# 32 bytes. Since it's passed in as a base64 encoded string, it needs
# to be decoded.
encryption_key = base64.b64decode(base64_encryption_key)

blob.download_to_filename(
destination_file_name, encryption_key=encryption_key)

print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))


def rotate_encryption_key(bucket_name, blob_name, base64_encryption_key,
base64_new_encryption_key):
"""Performs a key rotation by re-writing an encrypted blob with a new
encryption key."""
raise NotImplementedError(
'This is currently not available using the Cloud Client Library.')


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
subparsers = parser.add_subparsers(dest='command')

subparsers.add_parser(
'generate-encryption-key', help=generate_encryption_key.__doc__)

upload_parser = subparsers.add_parser(
'upload', help=upload_encrypted_blob.__doc__)
upload_parser.add_argument(
'bucket_name', help='Your cloud storage bucket.')
upload_parser.add_argument('source_file_name')
upload_parser.add_argument('destination_blob_name')
upload_parser.add_argument('base64_encryption_key')

download_parser = subparsers.add_parser(
'download', help=download_encrypted_blob.__doc__)
download_parser.add_argument(
'bucket_name', help='Your cloud storage bucket.')
download_parser.add_argument('source_blob_name')
download_parser.add_argument('destination_file_name')
download_parser.add_argument('base64_encryption_key')

rotate_parser = subparsers.add_parser(
'rotate', help=rotate_encryption_key.__doc__)
rotate_parser.add_argument(
'bucket_name', help='Your cloud storage bucket.')
download_parser.add_argument('blob_name')
download_parser.add_argument('base64_encryption_key')
download_parser.add_argument('base64_new_encryption_key')

args = parser.parse_args()

if args.command == 'generate-encryption-key':
generate_encryption_key()
elif args.command == 'upload':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about making doing download_parser.set_default(func=encrypted_blog) ? Advantage is we kill this ugly big else, disadvantage is that our functions just take args and have to parse out the params themselves. Guessing you prefer this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I explicitly picked this pattern over that one.

I don't like our functions just taking args. It's less clear, it makes testing slightly harder, and it's just doesn't feel right. We've worked around this in the past by having a '_command' function, eg:

list_objects(bucket_name):
    ...

list_objects_command(args):
    list_objects(args.bucket_name)

but I don't like having to write those either. This is more direct and simple, but I'm open to arguments against it or for something else.

upload_encrypted_blob(
args.bucket_name,
args.source_file_name,
args.destination_blob_name,
args.base64_encryption_key)
elif args.command == 'download':
download_encrypted_blob(
args.bucket_name,
args.source_blob_name,
args.destination_file_name,
args.base64_encryption_key)
elif args.command == 'rotate':
rotate_encryption_key(
args.bucket_name,
args.blob_name,
args.base64_encryption_key,
args.base64_new_encryption_key)
67 changes: 67 additions & 0 deletions storage/cloud-client/encryption_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Copyright 2016 Google, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import base64
import tempfile

import encryption
from gcloud import storage
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this match PEP8?

Specifically, encryption should be in a third section if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this project's organization this is actually in the correct section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. How so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quirk of flake8-import-order, because I run flake8 from the root directory and this lives in a subfolder, it considers encryption part of the third-party section instead of the local section. I have an idea for a fix now that we're using nox, I'll see if I can get that to work in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


TEST_ENCRYPTION_KEY = 'brtJUWneL92g5q0N2gyDSnlPSYAiIVZ/cWgjyZNeMy0='
TEST_ENCRYPTION_KEY_DECODED = base64.b64decode(TEST_ENCRYPTION_KEY)


def test_generate_encryption_key(capsys):
encryption.generate_encryption_key()
out, _ = capsys.readouterr()
encoded_key = out.split(':', 1).pop().strip()
key = base64.b64decode(encoded_key)
assert len(key) == 32, 'Returned key should be 32 bytes'


def test_upload_encrypted_blob(cloud_config):
with tempfile.NamedTemporaryFile() as source_file:
source_file.write(b'test')

encryption.upload_encrypted_blob(
cloud_config.storage_bucket,
source_file.name,
'test_encrypted_upload_blob',
TEST_ENCRYPTION_KEY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no asserts, but the sample is simple enough that there are only two possible branches: success or exception. If there is any exception, the case will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I was being silly and didn't realize it was a system test (as opposed to a mocked out call)



@pytest.fixture
def test_blob(cloud_config):
"""Provides a pre-existing blob in the test bucket."""
bucket = storage.Client().bucket(cloud_config.storage_bucket)
blob = bucket.blob('encrption_test_sigil')
content = 'Hello, is it me you\'re looking for?'
blob.upload_from_string(
content,
encryption_key=TEST_ENCRYPTION_KEY_DECODED)
return blob.name, content


def test_download_blob(test_blob, cloud_config):
test_blob_name, test_blob_content = test_blob
with tempfile.NamedTemporaryFile() as dest_file:
encryption.download_encrypted_blob(
cloud_config.storage_bucket,
test_blob_name,
dest_file.name,
TEST_ENCRYPTION_KEY)

downloaded_content = dest_file.read().decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here remove space in utf-8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space? You mean dash? The dash? The dash is present in the standard library
codecs module and is generally accepted as the right alias for codecs.UTF8

On Thu, Jul 14, 2016, 4:43 PM puneith [email protected] wrote:

In storage/cloud-client/encryption_test.py
#421 (comment)
:

  • blob.upload_from_string(
  •    content,
    
  •    encryption_key=TEST_ENCRYPTION_KEY_DECODED)
    
  • return blob.name, content

+def test_download_blob(test_blob, cloud_config):

  • test_blob_name, test_blob_content = test_blob
  • with tempfile.NamedTemporaryFile() as dest_file:
  •    encryption.download_encrypted_blob(
    
  •        cloud_config.storage_bucket,
    
  •        test_blob_name,
    
  •        dest_file.name,
    
  •        TEST_ENCRYPTION_KEY)
    
  •    downloaded_content = dest_file.read().decode('utf-8')
    

same here remove space in utf-8


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/python-docs-samples/pull/421/files/6488d282cbc7c7bc4505b0018b41e6249da201e6#r70902628,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPUcw7_ZxlTr8NBXtMScd_Y8-4ycnnRks5qVsmEgaJpZM4JM3oD
.

assert downloaded_content == test_blob_content
Loading