`_write_to_stream()` in the `Download` class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

houglum · 2018-09-19T23:41:33Z

Note for Googlers: This is the follow-up for an internal-only-visible issue at https://issuetracker.google.com/issues/115343385.

The Decompressive Transcoding docs for Cloud Storage [1] state that if an object's Content-Encoding is set to "gzip", we'll decompress the object when someone performs a media request to download it... unless 1 of 2 conditions are true -- one of which is if the requester sets the "Accept-Encoding: gzip" header on the request; then we'll serve the bytes in gzipped form.

This works as expected from the service, but when downloading a gzipped object using the Cloud Storage Python client library [2] (which uses this library), the bytes end up being written to the desired file as if they were served decoded.

It turns out, the bytes are being served gzipped-encoded, but this library is decoding them before writing them out to the desired file/stream. I used this Python script to reproduce the issue:

import os
from google.cloud import storage

BUCKET='set your bucket name here'
GZIPPED_OBJECT='set your object name here'

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/json/keyfile.json'

# The object {GZIPPED_OBJECT} contains gzipped text, and I've set its
# Content-Encoding header to the value "gzip". 
client = storage.Client()
bkt = client.get_bucket(BUCKET)
blob = bkt.get_blob(GZIPPED_OBJECT)

with open('/tmp/gzipped-from-cli-lib', 'w') as f:
  blob.download_to_file(f, client)

and before running it, I set a breakpoint directly before this line:
https://github.com/GoogleCloudPlatform/google-resumable-media-python/blob/50f4c4d22cdaea71c794639226e819197f11f555/google/resumable_media/requests/download.py#L121

...and if you get to that breakpoint and execute response.raw.read() in the debugger, it gives you the gzipped bytes for the object. The problem comes from using iter_content() -- that method should NOT be used here if response.request.headers contains the ("accept-encoding", "gzip") pair, since it will explicitly decompress the stream, as mentioned in the warning box in the webdocs for the Requests library [3]:
"""
Note
An important note about using Response.iter_content versus Response.raw. Response.iter_content will automatically decode the gzip and deflate transfer-encodings. Response.raw is a raw stream of bytes – it does not transform the response content. If you really need access to the bytes as they were returned, use Response.raw.
"""

We should be checking for that header, and if it's present, manually chunking/writing the bytes to the stream instead of using iter_content().

[1] https://cloud.google.com/storage/docs/transcoding#decompressive_transcoding
[2] https://github.com/GoogleCloudPlatform/google-cloud-python/
[3] http://docs.python-requests.org/en/master/user/quickstart/#raw-response-content

The text was updated successfully, but these errors were encountered:

dhermes · 2018-09-19T23:54:01Z

This was something I was worried could happen but never adequately followed up, thanks @houglum. There was discussion in #35 and #36 (and related issues).

houglum · 2018-09-20T03:47:14Z

Thanks for the quick response!

I'd write up a fix for this, but I've got other higher-priority work lined up for the next couple of weeks. And the bulk of my time on this would be spent looking through the tests and figuring out how to add a test for it :P

Is it likely this will be fixed anytime soon? If not, I can take a stab at it when I get the time.

Allowing 'requests' to expand / inflate the file's body defeats downstream excpectations. Closes #37. Closes #49. Closes #50.

Don't expect 'Content-Length' header for chunked downloads. Closes #37. Closes #49. Closes #50. Closes #56. Closes #76.

houglum mentioned this issue Sep 20, 2018

Storage: download_to_file() decompresses gzip'd object even when passed "Accept-Encoding: gzip" googleapis/google-cloud-python#6035

Closed

JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. labels Dec 8, 2018

crwilcox mentioned this issue Dec 20, 2018

Systest Failure: 'test_chunked_download'. #56

Closed

tseaver added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jan 16, 2019

yoshi-automation added the 🚨 This issue needs some love. label Mar 18, 2019

HemangChothani mentioned this issue Jul 31, 2019

GRMP: Fix/download forces decompression response #85

Closed

tseaver added a commit that referenced this issue Aug 14, 2019

Always use 'raw' response data.

613ea8a

Allowing 'requests' to expand / inflate the file's body defeats downstream excpectations. Closes #37. Closes #49. Closes #50.

tseaver mentioned this issue Aug 14, 2019

Always use raw response data. #87

Merged

crwilcox assigned tseaver Aug 15, 2019

tseaver closed this as completed in #87 Aug 27, 2019

tseaver added a commit that referenced this issue Aug 27, 2019

Always use raw response data. (#87)

2b9ffc8

Don't expect 'Content-Length' header for chunked downloads. Closes #37. Closes #49. Closes #50. Closes #56. Closes #76.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_write_to_stream()` in the `Download` class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

`_write_to_stream()` in the `Download` class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

houglum commented Sep 19, 2018

dhermes commented Sep 19, 2018

houglum commented Sep 20, 2018

_write_to_stream() in the Download class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

_write_to_stream() in the Download class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

Comments

houglum commented Sep 19, 2018

dhermes commented Sep 19, 2018

houglum commented Sep 20, 2018

`_write_to_stream()` in the `Download` class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49

`_write_to_stream()` in the `Download` class forces decompression of the response even if "Accept-Encoding: gzip" was present in the request #49