-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor cloud client storage samples. #421
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
#!/usr/bin/env python | ||
|
||
# Copyright 2016 Google, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""This application demonstrates how to upload and download encrypted blobs | ||
(objects) in Google Cloud Storage. | ||
|
||
Use `generate-encryption-key` to generate an example key: | ||
|
||
python encryption.py generate-encryption-key | ||
|
||
Then use the key to upload and download files encrypted with a custom key. | ||
|
||
For more information, see the README.md under /storage and the documentation | ||
at https://cloud.google.com/storage/docs/encryption. | ||
""" | ||
|
||
import argparse | ||
import base64 | ||
import os | ||
|
||
from gcloud import storage | ||
|
||
|
||
def generate_encryption_key(): | ||
"""Generates a 256 bit (32 byte) AES encryption key and prints the | ||
base64 representation. | ||
|
||
This is included for demonstration purposes. You should generate your own | ||
key. Please remember that encryption keys should be handled with a | ||
comprehensive security policy. | ||
""" | ||
key = os.urandom(32) | ||
encoded_key = base64.b64encode(key).decode('utf-8') | ||
print('Base 64 encoded encryption key: {}'.format(encoded_key)) | ||
|
||
|
||
def upload_encrypted_blob(bucket_name, source_file_name, | ||
destination_blob_name, base64_encryption_key): | ||
"""Uploads a file to a Google Cloud Storage bucket using a custom | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we discussed this offline but...still think describing what the params are and their type may be valuable standard to set. Even if they are all strings, because it's good to document that and it's good standard to set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to strike a balance between simplicity and clarity. I think that the idea is that the name of the variable and its usage should imply its type well enough - I feel it's overt that all of these parameters should be strings. @pfritzsche what are your thoughts? Overall, I want to avoid having a completely overbearing docstring here. I can be talked into them, but I tend towards less is more for samples. (For libraries, I'm the other way - super docstrings everywhere). Also, keep in mind that the main section also provides us a chance to give help strings for command-line arguments. Although only the bucket name has a help string in these samples. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tend to prefer self-documenting parameters, without explicit arg definitions in doc strings because 1) the doc strings become a hassle to maintain, and 2) they're often unnecessary and/or redundant. For example, I try to avoid things like:
When you have more complex input params (objects, dictionaries with variable keys), etc., then I think it becomes more worth it. tl;dr: In this case, I don't think it's necessary, but in some more complex cases I do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, sounds like you two are leaning against it which is fine. To me, value of the doc string is 1) the type expected 2) setting the standard for all those cases where an extra sentence can help clarify thing. But don't feel strongly about it, just wanted to make sure we discussed/decided on it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a fair argument. I'm not opposed to defining types in docstrings There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 use your judgment :) In the canonical template sample, though, we should definitely include a method where the arguments are described (including its type), so codify the format we want it in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For sure, I'll make sure that the sample guide shows an example of a more complex function that needs to be documented. |
||
encryption key. | ||
|
||
The file will be encrypted by Google Cloud Storage and only | ||
retrievable using the provided encryption key. | ||
""" | ||
storage_client = storage.Client() | ||
bucket = storage_client.get_bucket(bucket_name) | ||
blob = bucket.blob(destination_blob_name) | ||
|
||
# Encryption key must be an AES256 key represented as a bytestring with | ||
# 32 bytes. Since it's passed in as a base64 encoded string, it needs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per our chat - I'm debating if the fact that you use base64 could throw people off. It's a side-effect of the way you produced the key, and not a requirement of the API. Maybe update this comment as follows? "Since this sample application uses a base64 encoded string to store our encryption key, it needs..." That way you can make it clear it's just a design choice, and not a functional requirement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it's also a side-effect of the way this function is called (from the command-line), but let me think on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might make sense to do this using argparse - ie
which is nice because it localizes the side-effect to the command-line interface, and this function only has to care about what it cares about. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems a bit magical. So ostensibly gcloud-python will support base64 keys soon, so I won't have to decode them anyway. But even without that I'd rather do the decoding in the function do it doesn't seem magical. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But it's so cooooolll :) |
||
# to be decoded. | ||
encryption_key = base64.b64decode(base64_encryption_key) | ||
|
||
blob.upload_from_filename( | ||
source_file_name, encryption_key=encryption_key) | ||
|
||
print('File {} uploaded to {}.'.format( | ||
source_file_name, | ||
destination_blob_name)) | ||
|
||
|
||
def download_encrypted_blob(bucket_name, source_blob_name, | ||
destination_file_name, base64_encryption_key): | ||
"""Downloads a previously-encrypted blob from Google Cloud Storage. | ||
|
||
The encryption key provided must be the same key provided when uploading | ||
the blob. | ||
""" | ||
storage_client = storage.Client() | ||
bucket = storage_client.get_bucket(bucket_name) | ||
blob = bucket.blob(source_blob_name) | ||
|
||
# Encryption key must be an AES256 key represented as a bytestring with | ||
# 32 bytes. Since it's passed in as a base64 encoded string, it needs | ||
# to be decoded. | ||
encryption_key = base64.b64decode(base64_encryption_key) | ||
|
||
blob.download_to_filename( | ||
destination_file_name, encryption_key=encryption_key) | ||
|
||
print('Blob {} downloaded to {}.'.format( | ||
source_blob_name, | ||
destination_file_name)) | ||
|
||
|
||
def rotate_encryption_key(bucket_name, blob_name, base64_encryption_key, | ||
base64_new_encryption_key): | ||
"""Performs a key rotation by re-writing an encrypted blob with a new | ||
encryption key.""" | ||
raise NotImplementedError( | ||
'This is currently not available using the Cloud Client Library.') | ||
|
||
|
||
if __name__ == '__main__': | ||
parser = argparse.ArgumentParser( | ||
description=__doc__, | ||
formatter_class=argparse.RawDescriptionHelpFormatter) | ||
subparsers = parser.add_subparsers(dest='command') | ||
|
||
subparsers.add_parser( | ||
'generate-encryption-key', help=generate_encryption_key.__doc__) | ||
|
||
upload_parser = subparsers.add_parser( | ||
'upload', help=upload_encrypted_blob.__doc__) | ||
upload_parser.add_argument( | ||
'bucket_name', help='Your cloud storage bucket.') | ||
upload_parser.add_argument('source_file_name') | ||
upload_parser.add_argument('destination_blob_name') | ||
upload_parser.add_argument('base64_encryption_key') | ||
|
||
download_parser = subparsers.add_parser( | ||
'download', help=download_encrypted_blob.__doc__) | ||
download_parser.add_argument( | ||
'bucket_name', help='Your cloud storage bucket.') | ||
download_parser.add_argument('source_blob_name') | ||
download_parser.add_argument('destination_file_name') | ||
download_parser.add_argument('base64_encryption_key') | ||
|
||
rotate_parser = subparsers.add_parser( | ||
'rotate', help=rotate_encryption_key.__doc__) | ||
rotate_parser.add_argument( | ||
'bucket_name', help='Your cloud storage bucket.') | ||
download_parser.add_argument('blob_name') | ||
download_parser.add_argument('base64_encryption_key') | ||
download_parser.add_argument('base64_new_encryption_key') | ||
|
||
args = parser.parse_args() | ||
|
||
if args.command == 'generate-encryption-key': | ||
generate_encryption_key() | ||
elif args.command == 'upload': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you feel about making doing download_parser.set_default(func=encrypted_blog) ? Advantage is we kill this ugly big else, disadvantage is that our functions just take args and have to parse out the params themselves. Guessing you prefer this way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so I explicitly picked this pattern over that one. I don't like our functions just taking args. It's less clear, it makes testing slightly harder, and it's just doesn't feel right. We've worked around this in the past by having a '_command' function, eg:
but I don't like having to write those either. This is more direct and simple, but I'm open to arguments against it or for something else. |
||
upload_encrypted_blob( | ||
args.bucket_name, | ||
args.source_file_name, | ||
args.destination_blob_name, | ||
args.base64_encryption_key) | ||
elif args.command == 'download': | ||
download_encrypted_blob( | ||
args.bucket_name, | ||
args.source_blob_name, | ||
args.destination_file_name, | ||
args.base64_encryption_key) | ||
elif args.command == 'rotate': | ||
rotate_encryption_key( | ||
args.bucket_name, | ||
args.blob_name, | ||
args.base64_encryption_key, | ||
args.base64_new_encryption_key) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Copyright 2016 Google, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import base64 | ||
import tempfile | ||
|
||
import encryption | ||
from gcloud import storage | ||
import pytest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this match PEP8? Specifically, encryption should be in a third section if so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Due to this project's organization this is actually in the correct section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting. How so? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. quirk of flake8-import-order, because I run flake8 from the root directory and this lives in a subfolder, it considers encryption part of the third-party section instead of the local section. I have an idea for a fix now that we're using nox, I'll see if I can get that to work in another PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. |
||
|
||
TEST_ENCRYPTION_KEY = 'brtJUWneL92g5q0N2gyDSnlPSYAiIVZ/cWgjyZNeMy0=' | ||
TEST_ENCRYPTION_KEY_DECODED = base64.b64decode(TEST_ENCRYPTION_KEY) | ||
|
||
|
||
def test_generate_encryption_key(capsys): | ||
encryption.generate_encryption_key() | ||
out, _ = capsys.readouterr() | ||
encoded_key = out.split(':', 1).pop().strip() | ||
key = base64.b64decode(encoded_key) | ||
assert len(key) == 32, 'Returned key should be 32 bytes' | ||
|
||
|
||
def test_upload_encrypted_blob(cloud_config): | ||
with tempfile.NamedTemporaryFile() as source_file: | ||
source_file.write(b'test') | ||
|
||
encryption.upload_encrypted_blob( | ||
cloud_config.storage_bucket, | ||
source_file.name, | ||
'test_encrypted_upload_blob', | ||
TEST_ENCRYPTION_KEY) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this testing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are no asserts, but the sample is simple enough that there are only two possible branches: success or exception. If there is any exception, the case will fail. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. I was being silly and didn't realize it was a system test (as opposed to a mocked out call) |
||
|
||
|
||
@pytest.fixture | ||
def test_blob(cloud_config): | ||
"""Provides a pre-existing blob in the test bucket.""" | ||
bucket = storage.Client().bucket(cloud_config.storage_bucket) | ||
blob = bucket.blob('encrption_test_sigil') | ||
content = 'Hello, is it me you\'re looking for?' | ||
blob.upload_from_string( | ||
content, | ||
encryption_key=TEST_ENCRYPTION_KEY_DECODED) | ||
return blob.name, content | ||
|
||
|
||
def test_download_blob(test_blob, cloud_config): | ||
test_blob_name, test_blob_content = test_blob | ||
with tempfile.NamedTemporaryFile() as dest_file: | ||
encryption.download_encrypted_blob( | ||
cloud_config.storage_bucket, | ||
test_blob_name, | ||
dest_file.name, | ||
TEST_ENCRYPTION_KEY) | ||
|
||
downloaded_content = dest_file.read().decode('utf-8') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here remove space in utf-8 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Space? You mean dash? The dash? The dash is present in the standard library On Thu, Jul 14, 2016, 4:43 PM puneith [email protected] wrote:
|
||
assert downloaded_content == test_blob_content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about returning the value, rather than printing it?
I'm of two minds about it:
print_
prefix to make this behavior clearThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonparrott remove the space in 'utf-8'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytests'
capsys
does a good job of making the return or not return not much of an issue for tests.There is no calling function, just the main clause. I'd prefer not to have the main clause do any printing.
All of these should be in the docs. :)
I think I'm going to say "Use your judgement on whether to return or print the value. If this function is the top of the stack, it likely makes sense to just print."