iter_content() memory leak + raw read problems #1401

foxx · 2013-06-01T21:10:38Z

The following code will cause a memory leak;

r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
for chunk in r.iter_content(128):
    file.write(chunk)

To fix this, I had to use raw() instead;

r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
while True:
    print r.raw
    d = r.raw.read(512, decode_content=True)
    if not d:
        break

Cal

The text was updated successfully, but these errors were encountered:

foxx · 2013-06-01T21:49:53Z

Also, if the remote side is using gzip and you specify an iter_content size of 1, then the result is blank because it cannot decode anything. This is a bit misleading given that the default size is 1.

If you use .raw.read() it returns a bunch of unicode because of the gzip compression. By default, I think it should use 'decode_content=True', as this is again a bit confusing. But as it is referencing a urllib3 model, this is difficult. Perhaps we should expose a read() object instead?

I had to go trawling through the code to understand this, so I think these warrant a fix, or at the very least a docs patch.

Personally my suggestion would be;

Implement read() instead of calling raw.read() so we can forcibly enable decode_content OR patch the docs to explain this
Change the default read size to enforce a minimum of X bytes if gzip/encoding is being used.

Thoughts?

Lukasa · 2013-06-03T09:03:26Z

Hi @foxx, thanks for opening this issue! Let me address these in no particular order.

Implementing .read(). Requests explicitly provides the underlying transport library's notion of a response in the Response.raw property. If you believe that decode_content=True should be the default behaviour (and I agree that it should), I recommend you open a pull request to that effect on urllib3. I'm sure @shazow will be happy to take a look at it. =)
Changing the default read size is a bit of an odder one. Given the point I made in 1, we probably can't do it in Requests itself. I would also argue that any method named read() should behave like a file-like object, which doesn't have a minimum read value. When I get internet in my new house, I'm going to take a look at implementing the idea in Be more obvious when out of data urllib3/urllib3#186, which should be able to paper over this issue.
Default iter_content() size is a sore topic around here: see umbrella ticket to resolve iteration / read size / chunked encoding questions #844, which has still not been completely resolved. Your input is welcome there. =)
Where is the memory leak in iter_content()?

foxx · 2013-06-03T09:41:43Z

Thanks for the quick reply!

Ticket raised - raw.read() unexpected behavior with gzip compression - needs docs urllib3/urllib3#189
Ticket raised - iter_content(size=1) behaves oddly with gzip urllib3/urllib3#190
Added some comments onto umbrella ticket to resolve iteration / read size / chunked encoding questions #844

In ref to #4, that's an excellent question.. I wrote a small test outside of my project code base and the problem disappeared.. I even tried wrapping it with eventlet and still couldn't reproduce the problem. So it would seem that the memory leak is specific to something else I'm doing.. Until I can figure this out, I guess the ticket can be marked as closed for now.

For future ref, the code used to disprove the memory leak was;

import eventlet
eventlet.monkey_patch(all=True)
import requests


def lol():
    f = open('/tmp/test.txt', 'wb')
    print "sending request"
    r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
    print "reading.."
    for chunk in r.iter_content(128):
        f.write(chunk)
    f.close()

e = eventlet.spawn(lol)
e.wait()

Apologies for this, I should have really done this before hand.

Cal

Lukasa · 2013-06-04T07:52:13Z

No need to apologise, that's why I followed up. =) Thanks for raising this!

This was referenced Jun 3, 2013

raw.read() unexpected behavior with gzip compression - needs docs urllib3/urllib3#189

Closed

iter_content(size=1) behaves oddly with gzip urllib3/urllib3#190

Closed

foxx closed this as completed Jun 3, 2013

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iter_content() memory leak + raw read problems #1401

iter_content() memory leak + raw read problems #1401

foxx commented Jun 1, 2013

foxx commented Jun 1, 2013

Lukasa commented Jun 3, 2013

foxx commented Jun 3, 2013

Lukasa commented Jun 4, 2013

iter_content() memory leak + raw read problems #1401

iter_content() memory leak + raw read problems #1401

Comments

foxx commented Jun 1, 2013

foxx commented Jun 1, 2013

Lukasa commented Jun 3, 2013

foxx commented Jun 3, 2013

Lukasa commented Jun 4, 2013