Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patron adapter does not support keep-alive #2

Open
derSascha opened this issue Jul 13, 2019 · 17 comments
Open

Patron adapter does not support keep-alive #2

derSascha opened this issue Jul 13, 2019 · 17 comments

Comments

@derSascha
Copy link

Basic Info

  • Faraday Version: master
  • Ruby Version: 2.5.5

Issue description

The pull-request lostisland/faraday#796 removed the cached patron session to avoid issues with multi-threading (patron sessions are not thread-safe). To use keep-alive, the session has to be reused and this is now impossible... Can we revert the changes introduced in lostisland/faraday#796 and use e.g. a Mutex to made it thread-safe in faraday?

Steps to reproduce

Test Server:

require 'webrick'

server = WEBrick::HTTPServer.new Port: 4567
server.mount_proc '/' do |req, res|
  res.body = "peerport=#{req.peeraddr[1]}"
end
server.start

Test Client:

require 'patron'

s1 = Patron::Session.new
puts "First request on s1: #{s1.get('http://localhost:4567').body}"
puts "Second request on s1: #{s1.get('http://localhost:4567').body}"
s2 = Patron::Session.new
puts "Third request on s2: #{s2.get('http://localhost:4567').body}"

Output:

First request on s1: peerport=39218
Second request on s1: peerport=39218
Third request on s2: peerport=39220

The first and second requests reuse a session and using keep-alive on the same connection. The third request use a new session, creates a new connection and does not use keep-alive.

derSascha referenced this issue in derSascha/faraday Jul 13, 2019
Reuse Patron session to support keep-alive/connection reuse and make
them thread-safe using a mutex.

fixes #1001
derSascha referenced this issue in derSascha/faraday Jul 13, 2019
Reuse Patron session to support keep-alive/connection reuse and make
them thread-safe using a mutex.

fixes #1001
derSascha referenced this issue in derSascha/faraday Jul 13, 2019
Reuse Patron session to support keep-alive/connection reuse and make
them thread-safe using a mutex.

fixes #1001
@iMacTia
Copy link
Member

iMacTia commented Jul 14, 2019

Hi @derSascha and thanks for raising this.
We're going much into adapter-specific territory here and to be honest I haven't used the Patron adapter enough to be able to make a proper call here.
The PR you're referring to (#796) has been opened by @julik, who happens to be the main Patron maintainer, and states the exact opposite in the description:

Contrary to what one would believe, retaining a Patron session will not lead to the libCURL handle being reused across requests to achieve keepalive.

I'm confuse at this point on which of the two behaviours is actually the correct one.
Before reintroducing re-usability of the Patron sessions I'd like to get a comment from @julik as well to confirm this is actually going to change things

@julik
Copy link

julik commented Jul 14, 2019

Hey, thanks for the mention!

After I've done the changes to the Faraday Patron adapter you are mentioning I actually went and investigated how Patron works with curl handles under the hood in detail. It turned out that even though it was not explicitly noted anywhere, the curl handle would get reused - it would be initialized lazily on the first request and then retained until the Session object gets garbage-collected. Having investigated that (for a service at work which does not use Faraday) I have written up the paragraph @derSascha references in the PR and also made sure that a curl handle gets initialized immediately on Session create (basically made the code less clever).

So apologies @derSascha and @iMacTia I didn't do my homework back then. I'm going to add a comment on the PR to clarify that.

Now, to the subject of the PR/issue. I don't know if having a global mutex is a good idea on it's own. Here is why: I use Faraday as a "bridge through" between Patron - which is our preferred HTTP client of choice at WeTransfer - and libraries that support Faraday as their HTTP client interface. This IMO is the value proposition. We also ended up implementing Faraday the same way in our FormatParser gem here

We use it in a one-shot fashion (using Faraday.get) and we use it from multiple threads.
In our work code I configure Patron as the default adapter like so

Faraday::Adapter.register_middleware :wt_patronito =>  WtPatronito::FaradayAdapter 
Faraday.default_connection = Faraday.new {|c| c.adapter :wt_patronito }

because I want all the libraries that I have in use in the application to make their requests through Patron, via Faraday. The problem with having a Mutex there is that I don't know what semantics the various ways of calling Faraday have when using in a multithreaded environment. Am I going to lock the one single connection for all threads if I do Faraday.get? If I do Faraday.new and then call get on that, will it lock the connection for the duration of the get? If I do have a connection pool, how do I make the connection use that pool to check out the connection for use?

And most importantly, given the API surface of Faraday, is there a guarantee that other libraries that I use in my code, which use Faraday, will be abiding by the same calling conventions and locking semantics?

From what I understand through a cursory inspection, there isn't really a semantic definition of how long-lived Faraday objects - such as connection adapters and middleware - get shared between threads. Therefore I would expect Faraday to either leave cross-thread sharing to the hosting application, or to provide a known API for setting up and configuring connection pools for objects that are supposed to be shared.

Now, to the issue that @derSascha has. I agree that the setup we have ("share nothing") is not ideal - it has been made to ensure maximum isolation of separate requests. I still think that when used in an application where threading is handled "by the authorities" (Sidekiq worker threads, Rails connection pooling etc.) most people simply would not understand why, say, all their Puma threads are suddenly performing all of their HTTP requests "in sequence" and wait on each other. It will be very hard to diagnose, because it will happen only when there are multiple simultaneous jobs/requests going on.

Given that, maybe we should split the Patron connection adapter into the one we have right now, and the one you can give a ConnectionPool to - it would then check out a connection from the pool, and return it to the pool afterwards. Or we could make a generic connection pool middleware that would work for all adapters, allowing them to be reused. I don't feel like introducing a Mutex into all Patron requests is the way to go. So, summary:

  • No for the Mutex part
  • Yes for retaining the Patron session

@iMacTia is there something I'm missing regarding Faraday's threading semantics?

@julik
Copy link

julik commented Jul 14, 2019

For example, something as simple as this can ensure the threading is safe:

class PooledAdapter < Faraday::Adapter
  def initialize(pool)
    @pool = pool
  end

  def call(env)
    super
    @pool.with do |faraday_adapter|
      faraday_adapter.call(env)
    end
  end
end

@iMacTia
Copy link
Member

iMacTia commented Jul 14, 2019

Thanks @julik for the long and detailed explanation.
Just to ensure we're all on the same page, my takes on this whole discussion are these:

  1. The current solution we have creates a new session every time there's a request. This makes it thread-safe and allows for concurrent requests to happen, but it's arguably a waste of resources and won't support the keep-alive.
  2. @derSascha proposed solution is to cache the connection (which will reuse the same connection and thus support keep-alive), in combination with a Mutex to make it thread-safe. The only downside of this solution is that only one request can be performed at any point in time due to the mutex, basically preventing concurrent requests from happening.
  3. @julik proposed solution goes one step further @derSascha's one by introducing a Connection Pool. This way only a limited number of session will be created, supporting reusability, keep-alive and thread-safety. Moreover, concurrency up to the pool size will also be supported.

If I got it right, then I agree the last solution is definitely the best one, and I like the idea of extending it to other adapters as well (something similar was raised for the net_http_persistent adapter in lostisland/faraday#793, so that might have thread-safety issues as well).

@technoweenie @olleolleolle do you guys have any insights/opinions on how the connection pooling should be factored into Faraday? I was thinking we might add it to every adapter (unless there's some incompatibility) and allow the user to customise the pool size by using adapter options:

conn = Faraday.new(...) do |f|
  ...
  f.adapter :patron, pool_size: 200
end

If we do that, does the net_http_persistent adapter actually become redundant?

@julik
Copy link

julik commented Jul 14, 2019

Good summary! Note also that keepalive will be attempted only if you do requests to the same hostname, same port and using the same protocol. So depending on the workload you might actually not want to have it, or it might have interesting performance implications. For example, imagine your application dispatches webhooks to hosts that users specify. In this case, after measuring the "bulks" of hostnames, you find out that even the most popular host that your users send callbacks to (say, Zapier or IFTTT) only accounts for 10% of the requests. It means that on average a very small subset of requests are going to be able to benefit from keepalive.

Consider another scenario. You have a storage manager which retrieves data from S3. You only work within one AWS region, but you are also dispatching calls to a different service, which you run in the same region and which sits behind a load balancer. You notice that about half of requests go to that service, and the rest to one S3 bucket. In that case, you will benefit from keepalive the most if you allocate separate connection pools for each of these - one for calls to your other service, and one to your S3 bucket. On each of these pools you will have a near-guarantee of keepalive.

So depending on the workload keepalive might be less effective than one would think it should be, even though there is nothing wrong with it - you just "switch" the CURL handle to talk to a different host/port, thereby resetting the connection.

@iMacTia
Copy link
Member

iMacTia commented Jul 15, 2019

Thanks @julik. Again, if my understanding is correct then the issue with reusability should be solved by creating a different Faraday connection for each service you want to call, which is the recommended way of using Faraday anyway. That means a different stack, adapter and consequently connection pool for each service.

@derSascha
Copy link
Author

@julik and @iMacTia, thanks for the explanations. The Mutex seems to be not the best way here and, as already explained by julik, it blocks all other requests. Is faraday intended to be thread-safe or is this something the user has to solve?

My app heavily uses elasticsearch and always connects to the same endpoint. Its clear that connecting to different hosts can't work with keep-alive.

Another simple solution: Keep the patron adapter how it is at the moment (thread-safe) and add a new one like patron_session. Add documentation to the adapters that the normal one can't use keep-alive and the new one is not thread-safe and should only used by advanced users.

@julik
Copy link

julik commented Jul 24, 2019

Maybe we can add connection pool support via an adapter option? If the connection pool is given it will use one, if not it will create and teardown the session in-place. The connection pool API is really tiny (the with method).

@julik
Copy link

julik commented Jul 24, 2019

Is faraday intended to be thread-safe or is this something the user has to solve?

Something I am wondering as well. The point is not only how your code works with Faraday but how do you make 3rd party code use your Faraday configuration correctly, regardless of the concurrency model.

@julik
Copy link

julik commented Jul 24, 2019

Re. previous suggestion

f.adapter :patron, pool_size: 200

I would say that passing an existing ConnectionPool object is a better idea - first you don't need a dependency on the connection_pool gem, and second the connection pools need configuration (like timeouts). So more something like

f.adapter :patron, connection_pool: Rails.config.http_client_pool

@derSascha
Copy link
Author

@julik Something like this?

  def call(env)
    super

    if @connection_options[:session_pool]
      @connection_options[:session_pool].with do |session|
        perform_request(session, env)
      end
    else
      perform_request(::Patron::Session.new, env)
    end
  end

  def perform_request(session, env)
    ...

@julik
Copy link

julik commented Jul 24, 2019

Yes, or even

pool = @connection_options.fetch(:connection_pool, NullPool)
pool.with do |session|
   ...

I would call it connection_pool to signal that the connection_pool gem can be used directly, and a NulPool would be

module NullPool
   def self.with
      session = Patron::Session.new
      yield(session)
  end
end

avoiding an extra conditional.

@iMacTia
Copy link
Member

iMacTia commented Jul 24, 2019

@julik @derSascha after discussing internally with the core team, we decided to give the connection pool a go. The decision that came out was to make Faraday using a connection_pool internally, so you don't need to do any extra configuration, it will just work. I'm experimenting with that at the moment.

In its current shape, connection pool options would be provided instead of a whole connection pool object: that should allow you to set things like size and timeout, together with any other option that the connection_pool gem provides.

@derSascha
Copy link
Author

@iMacTia any updates on this?

Test server:

require 'webrick'

server = WEBrick::HTTPServer.new Port: 4567
server.mount_proc '/' do |req, res|
  res.body = "peerport=#{req.peeraddr[1]}"
end 
server.start

Test client:

conn = Faraday.new(url: 'http://localhost:4567')
conn.adapter :patron
puts conn.get('/').body
puts conn.get('/').body
puts conn.get('/').body

Results in

peerport=41712
peerport=41714
peerport=41716

Seems to be still broken...

@julik
Copy link

julik commented Feb 11, 2021

@derSascha If you do this with raw Patron do you observe the same result? You need to use the same Session object.

@derSascha
Copy link
Author

@julik if you change this line to something like session = @session ||= ::Patron::Session.new it already works, but this is not thread-safe. See PR lostisland/faraday#1002

@iMacTia
Copy link
Member

iMacTia commented Feb 14, 2021

I'm sorry @derSascha, I started the work to introduce the connection pool in lostisland/faraday#1006 but never got to the end of it because at the moment all adapters live in Faraday and it's quite a pain to make it work for all of them.
This is one of those things that will be definitely easier to do once we move out adapters from Faraday, a process that is already ongoing 👍 !

@iMacTia iMacTia transferred this issue from lostisland/faraday Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants