Setting Content-Encoding header for Cloud Storage Uploads with upload_from_file #3099

brianjpetersen · 2017-03-05T03:13:53Z

I'm on Python 3.5.2 with google.cloud.storage.__version__ = '0.23.0'.

I'm attempting to upload objects to a bucket such that the object supports decompressive gzip transcoding. I haven't been able to figure out how to accomplish this after searching through the documentation and the code as well as reviewing existing issues. My most promising attempt was setting the blob.content_encoding property, which seems like it should work but doesn't. See below for an example.

Does/can the API support this?

import google.cloud.storage
import gzip
import os
import requests
import datetime
import io


BUCKET_NAME = ...
GOOGLE_APPLICATION_CREDENTIALS_PATH = ...


os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = GOOGLE_APPLICATION_CREDENTIALS_PATH
client = google.cloud.storage.Client()
bucket = client.get_bucket(BUCKET_NAME)

blob = bucket.blob('plaintext')
blob.content_type = 'text/plain'
with io.BytesIO() as f:
    f.write(b' '.join(100*(b'plaintext', )))
    blob.upload_from_file(f, size=f.tell(), rewind=True)
url = blob.generate_signed_url(datetime.datetime.max)
"""
This prints:

b'plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext' None
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))


blob = bucket.blob('compressed')
blob.content_type = 'text/plain'
blob.content_encoding = 'gzip'
with io.BytesIO() as f:
    with gzip.GzipFile(fileobj=f, mode='wb',  compresslevel=9) as fgz:
        fgz.write(b' '.join(100*(b'compressed', )))
    blob.upload_from_file(f, size=f.tell(), rewind=True)
url = blob.generate_signed_url(datetime.datetime.max)
"""
This prints:

b'\x1f\x8b\x08\x00\xac}\xbbX\x02\xffK\xce\xcf-(J-.NMQH\x1ee\x8e2G\x99\xa3L2\x99\x00/\x80\x15\xa7K\x04\x00\x00' None
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))


"""
If I manually set the content-encoding header through the metadata option on this object in the console, I get the appropriate response:

b'compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed' gzip
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))

The text was updated successfully, but these errors were encountered:

lukesneeringer · 2017-03-09T16:37:37Z

Hi @brianjpetersen,
Thanks for raising this, and sorry it took a couple days for us to say anything in response.

Let me summarize to make sure I understand the problem. Basically there seems to be no obvious valid way to set the Content-Encoding in the metadata to gzip and have it stick in storage. Is that correct?

brianjpetersen · 2017-03-09T16:45:09Z

That's right. Setting the content_encoding attribute to 'gzip' on the object before uploading doesn't actually result in proper transcoding on subsequent GETs to Cloud Storage. Furthermore, the metadata on the uploaded object (viewed in the Cloud Storage in the web console) doesn't reflect that the Content-Encoding was set to 'gzip' (see below).

lukesneeringer · 2017-03-09T16:49:32Z

Thanks. We will look into it.

pdknsk · 2017-03-18T22:45:55Z

This is a duplicate of several bugs, summed up in this comment, which also has a work-around.

brianjpetersen · 2017-03-18T23:20:36Z

Many thanks @pdknsk.

This workaround fixes the problem, although as noted in #754, the property update isn't atomic which has all sorts of nasty implications. As another commenter noted in a linked thread, unfortunately this prevents me from using gcloud-python (and Google Cloud Platform) at this time.

pdknsk · 2017-03-19T00:24:56Z

I remembered a patch I had once used, which I've updated now. An alternative is to use the API directly, which is more complex.

brianjpetersen · 2017-03-19T01:09:33Z

This unfortunately didn't seem to do the trick for the content_encoding property.

pdknsk · 2017-03-19T04:13:35Z

Works for me.

>>> compressobj = zlib.compressobj(9, zlib.DEFLATED, 31) # 31 = gzip
>>> text_gzip = compressobj.compress('text') + compressobj.flush()
>>> len(text_gzip)
24
>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.content_encoding = 'gzip'
>>> text.upload_from_string(text_gzip)
>>> text.reload()
>>> text.size
24
>>> req = requests.get(text.public_url)
>>> req.content
'text'
>>> req.headers.get('Content-Encoding')
'gzip'

In the browser too.

brianjpetersen · 2017-03-19T14:03:03Z

Apologies @pdknsk, pip and I weren't getting along last night. This does indeed address my need. You've been super helpful - thanks.

brianjpetersen · 2017-03-19T14:20:06Z

Although this contrived example works, now I'm getting a gzip-decoding error from requests with larger payloads. Using your example (slightly modified for Python 3):

>>> compressobj = zlib.compressobj(9, zlib.DEFLATED, 31) # 31 = gzip
>>> text_gzip = compressobj.compress(100*b'text') + compressobj.flush()
>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.content_encoding = 'gzip'
>>> text.upload_from_string(text_gzip)
>>> req = requests.get(text.public_url)

Traceback (most recent call last):
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 192, in _decode
    data = self._decoder.decompress(data)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 58, in decompress
    return self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: invalid distance too far back

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 664, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 349, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 503, in read_chunked
    flush_decoder=False)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 197, in _decode
    "failed to decode it." % content_encoding, e)
requests.packages.urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: invalid distance too far back',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 47, in <module>
    response = requests.get(url)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/api.py", line 71, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/sessions.py", line 617, in send
    r.content
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 741, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 669, in generate
    raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: invalid distance too far back',))

Is this possibly related to #1724?

daspecster · 2017-03-20T18:01:14Z

@brianjpetersen googling that error for requests got me to this SO question which lead me to the following issue.

See: https://bugs.python.org/issue27164

It sounds like there's an issue with Python 3.5.2. If you upgrade do you still have the same issue?

brianjpetersen · 2017-03-20T19:21:40Z

That seems to be it. It's working on my 2.7 binary. Thanks.

daspecster · 2017-03-20T19:46:49Z

OK great! I'm going to close this then.

danielguardicore · 2019-03-20T16:02:56Z

Using Python 2.7 and the latest (as of this date) google-cloud module, this problem still occurs when using upload_from_string.

allardhoeve · 2019-06-19T09:04:47Z

To make it even stranger, I get this intermittently on Python 3.7.1 and latest google-cloud-python.

daspecster added the api: storage Issues related to the Cloud Storage API. label Mar 6, 2017

lukesneeringer added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Mar 9, 2017

daspecster closed this as completed Mar 20, 2017

JustinBeckwith assigned daspecster Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting Content-Encoding header for Cloud Storage Uploads with upload_from_file #3099

Setting Content-Encoding header for Cloud Storage Uploads with upload_from_file #3099

brianjpetersen commented Mar 5, 2017

lukesneeringer commented Mar 9, 2017

brianjpetersen commented Mar 9, 2017

lukesneeringer commented Mar 9, 2017

pdknsk commented Mar 18, 2017

brianjpetersen commented Mar 18, 2017

pdknsk commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017

pdknsk commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017 •

edited

Loading

daspecster commented Mar 20, 2017

brianjpetersen commented Mar 20, 2017

daspecster commented Mar 20, 2017

danielguardicore commented Mar 20, 2019

allardhoeve commented Jun 19, 2019

Setting Content-Encoding header for Cloud Storage Uploads with upload_from_file #3099

Setting Content-Encoding header for Cloud Storage Uploads with upload_from_file #3099

Comments

brianjpetersen commented Mar 5, 2017

lukesneeringer commented Mar 9, 2017

brianjpetersen commented Mar 9, 2017

lukesneeringer commented Mar 9, 2017

pdknsk commented Mar 18, 2017

brianjpetersen commented Mar 18, 2017

pdknsk commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017

pdknsk commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017

brianjpetersen commented Mar 19, 2017 • edited Loading

daspecster commented Mar 20, 2017

brianjpetersen commented Mar 20, 2017

daspecster commented Mar 20, 2017

danielguardicore commented Mar 20, 2019

allardhoeve commented Jun 19, 2019

brianjpetersen commented Mar 19, 2017 •

edited

Loading