Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_content() memory leak + raw read problems #1401

Closed
foxx opened this issue Jun 1, 2013 · 4 comments
Closed

iter_content() memory leak + raw read problems #1401

foxx opened this issue Jun 1, 2013 · 4 comments

Comments

@foxx
Copy link

foxx commented Jun 1, 2013

The following code will cause a memory leak;

r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
for chunk in r.iter_content(128):
    file.write(chunk)

To fix this, I had to use raw() instead;

r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
while True:
    print r.raw
    d = r.raw.read(512, decode_content=True)
    if not d:
        break

Cal

@foxx
Copy link
Author

foxx commented Jun 1, 2013

Also, if the remote side is using gzip and you specify an iter_content size of 1, then the result is blank because it cannot decode anything. This is a bit misleading given that the default size is 1.

If you use .raw.read() it returns a bunch of unicode because of the gzip compression. By default, I think it should use 'decode_content=True', as this is again a bit confusing. But as it is referencing a urllib3 model, this is difficult. Perhaps we should expose a read() object instead?

I had to go trawling through the code to understand this, so I think these warrant a fix, or at the very least a docs patch.

Personally my suggestion would be;

  • Implement read() instead of calling raw.read() so we can forcibly enable decode_content OR patch the docs to explain this
  • Change the default read size to enforce a minimum of X bytes if gzip/encoding is being used.

Thoughts?

@Lukasa
Copy link
Member

Lukasa commented Jun 3, 2013

Hi @foxx, thanks for opening this issue! Let me address these in no particular order.

  1. Implementing .read(). Requests explicitly provides the underlying transport library's notion of a response in the Response.raw property. If you believe that decode_content=True should be the default behaviour (and I agree that it should), I recommend you open a pull request to that effect on urllib3. I'm sure @shazow will be happy to take a look at it. =)
  2. Changing the default read size is a bit of an odder one. Given the point I made in 1, we probably can't do it in Requests itself. I would also argue that any method named read() should behave like a file-like object, which doesn't have a minimum read value. When I get internet in my new house, I'm going to take a look at implementing the idea in Be more obvious when out of data urllib3/urllib3#186, which should be able to paper over this issue.
  3. Default iter_content() size is a sore topic around here: see umbrella ticket to resolve iteration / read size / chunked encoding questions #844, which has still not been completely resolved. Your input is welcome there. =)
  4. Where is the memory leak in iter_content()?

@foxx
Copy link
Author

foxx commented Jun 3, 2013

Thanks for the quick reply!

  1. Ticket raised - raw.read() unexpected behavior with gzip compression - needs docs urllib3/urllib3#189
  2. Ticket raised - iter_content(size=1) behaves oddly with gzip urllib3/urllib3#190
  3. Added some comments onto umbrella ticket to resolve iteration / read size / chunked encoding questions #844

In ref to #4, that's an excellent question.. I wrote a small test outside of my project code base and the problem disappeared.. I even tried wrapping it with eventlet and still couldn't reproduce the problem. So it would seem that the memory leak is specific to something else I'm doing.. Until I can figure this out, I guess the ticket can be marked as closed for now.

For future ref, the code used to disprove the memory leak was;

import eventlet
eventlet.monkey_patch(all=True)
import requests


def lol():
    f = open('/tmp/test.txt', 'wb')
    print "sending request"
    r = requests.get('http://cachefly.cachefly.net/100mb.test', stream=True)
    print "reading.."
    for chunk in r.iter_content(128):
        f.write(chunk)
    f.close()

e = eventlet.spawn(lol)
e.wait()

Apologies for this, I should have really done this before hand.

Cal

@foxx foxx closed this as completed Jun 3, 2013
@Lukasa
Copy link
Member

Lukasa commented Jun 4, 2013

No need to apologise, that's why I followed up. =) Thanks for raising this!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants