Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP::Client overwrites body if it isn't read before another request is performed #371

Open
backus opened this issue Aug 23, 2016 · 12 comments

Comments

@backus
Copy link

backus commented Aug 23, 2016

If I use HTTP to perform two requests in a row and I don't do response.body.to_s on the first then the body string for the first response will be overwritten with "".

Here is a script that demonstrates the bug:

gem 'http', '2.0.3'
require 'http'

URL = Addressable::URI.parse('http://mockbin.org/bin/1dee24cd-defa-467a-9c5f-6bccaadca3ad')

puts "Running with http.rb version #{HTTP::VERSION}"

client = HTTP.headers('X-Some-Config' => 'Whatever')

puts "I'm going to make 4 requests in total. All of them are POST requests to the url you gave me."

puts "After the first and second request I'm going to print the response body immediately."

puts "I'm then going to make the third and fourth request back to back" \
     "without touching the response body of request #3"

puts "Request #1 and #3 should have the same response body but the fourth request" \
     "seems to clobber the body string"

puts

puts "Performing request #1..."
response1 = client.post(URL)
puts "request #1 response: #{response1.body.to_s.inspect}"
puts

puts "Performing request #2..."
response2 = client.post(URL)
puts "request #2 response: #{response2.body.to_s.inspect}"
puts

puts "Performing request #3..."
response3 = client.post(URL)
puts "Performing request #4..."
response4 = client.post(URL)

puts
puts "Request #3 and #4 done"

puts "Response body for request #1: #{response1.body.to_s.inspect}"
puts "Response body for request #3: #{response3.body.to_s.inspect}"

Output when I run the script:

Running with http.rb version 2.0.3 with ruby 2.3.1
I'm going to make 4 requests in total. All of them are POST requests to the url you gave me.
After the first and second request I'm going to print the response body immediately.
I'm then going to make the third and fourth request back to backwithout touching the response body of request #3
Request #1 and #3 should have the same response body but the fourth requestseems to clobber the body string

Performing request #1...
request #1 response: "Good job you made a POST request!"

Performing request #2...
request #2 response: "Good job you made a POST request!"

Performing request #3...
Performing request #4...

Request #3 and #4 done
Response body for request #1: "Good job you made a POST request!"
Response body for request #3: ""
@tarcieri
Copy link
Member

An empty string is definitely unexpected and undesirable in this case, however I don't think it makes sense to eagerly read bodies clients aren't interested in.

I think it would make sense to raise an exception in this case.

@backus
Copy link
Author

backus commented Aug 23, 2016

@tarcieri Ok so the issue in my case was that we have an integration test like this:

# creating the resource
response1 = client.post(...)

# getting the resource
response2 = client.get(response1.headers['Content-Location'])

expect(response1.body.to_s).to eql(response2.body.to_s)

Are you saying that this shouldn't be allowed behavior or should raise an exception?

@tarcieri
Copy link
Member

tarcieri commented Aug 24, 2016

I think if you make a request, do not consume the body, make another request with the same client, and then try to consume the body for the original request, it should raise an exception.

If you would like to consume the body for the original request, it should happen before you make a subsequent request, IMO. Otherwise the client needs to both consume and hang onto any response bodies simply because you might consume them at some point in the future.

Users uninterested in the response bodies would probably file a bug for that, calling it a "memory leak", and I would agree with them.

@backus
Copy link
Author

backus commented Aug 24, 2016

@tarcieri I think that sounds surprising to me because my impression of HTTP.rb was that it aimed to provide an immutable and chainable interface for performing requests. While I agree that raising an error would be better than what I've reported it seems like a bandaid. My impression was that the chaining methods were providing me new instances without shared mutable state.

Basically, just like how these two return values (ret1 and ret2) should not be able to mutate each other

client  = HTTP.accept('application/blah')

ret1 = client.accept("application/json")
ret2 = client.basic_auth(user: 'foo', password: 'bar')

it seems like these two return values should not be allowed to mutate each other:

client  = HTTP.accept('application/blah')

ret1 = client.get("https://google.com")
ret2 = client.get("https://github.com")

The usage of either of these two response objects should not alter the behavior/state of the other.

@backus
Copy link
Author

backus commented Aug 24, 2016

At the very least it would be nice if I could have something like response.finalize that I can call and get a simple immutable response wrapper that isn't tied to a connection object.

@tarcieri
Copy link
Member

Calling #to_s on the body prior to making another connection will accomplish that.

Yes, I will admit this behavior is a little bit surprising even to me.

@ixti
Copy link
Member

ixti commented Aug 24, 2016

there's helper for that too:

client  = HTTP.accept('application/blah')

ret1 = client.get("https://google.com").flush
ret2 = client.get("https://github.com").flush

ret1.to_s[0..42]
# => "<HTML><HEAD><meta http-equiv=\"content-type\""

ret2.to_s[0..42]
# => " "

@ixti
Copy link
Member

ixti commented Aug 24, 2016

flush consumes body and returns response itself

@backus
Copy link
Author

backus commented Aug 24, 2016

Would it introduce performance implications and/or breaking changes if each response object had a separate connection object?

@backus
Copy link
Author

backus commented Aug 24, 2016

Also thank you @ixti that is helpful

@ixti
Copy link
Member

ixti commented Aug 24, 2016

It will be (probably) a breaking change. And yeah, I guess it will make it easier to shoot the foot for one. But in general I think it's pretty doable and might become a pretty good improvement.

@britishtea
Copy link
Contributor

I think holding on to the last pending response in HTTP::Connection could solve this problem.

The connection already tracks if a response is pending (i.e. the body is not read yet) and has enough information to release its reference to the response to avoid a memory leak (#finish_response, #close).

If @pending_response (or an additional instance variable) were to contain an actual HTTP::Response instead of a bool the connection could call #flush on it when it's asked to send a new request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants