-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patron adapter does not support keep-alive #2
Comments
Reuse Patron session to support keep-alive/connection reuse and make them thread-safe using a mutex. fixes #1001
Reuse Patron session to support keep-alive/connection reuse and make them thread-safe using a mutex. fixes #1001
Reuse Patron session to support keep-alive/connection reuse and make them thread-safe using a mutex. fixes #1001
Hi @derSascha and thanks for raising this.
I'm confuse at this point on which of the two behaviours is actually the correct one. |
Hey, thanks for the mention! After I've done the changes to the Faraday Patron adapter you are mentioning I actually went and investigated how Patron works with curl handles under the hood in detail. It turned out that even though it was not explicitly noted anywhere, the curl handle would get reused - it would be initialized lazily on the first request and then retained until the Session object gets garbage-collected. Having investigated that (for a service at work which does not use Faraday) I have written up the paragraph @derSascha references in the PR and also made sure that a curl handle gets initialized immediately on Session create (basically made the code less clever). So apologies @derSascha and @iMacTia I didn't do my homework back then. I'm going to add a comment on the PR to clarify that. Now, to the subject of the PR/issue. I don't know if having a global mutex is a good idea on it's own. Here is why: I use Faraday as a "bridge through" between Patron - which is our preferred HTTP client of choice at WeTransfer - and libraries that support Faraday as their HTTP client interface. This IMO is the value proposition. We also ended up implementing Faraday the same way in our FormatParser gem here We use it in a one-shot fashion (using
because I want all the libraries that I have in use in the application to make their requests through Patron, via Faraday. The problem with having a Mutex there is that I don't know what semantics the various ways of calling Faraday have when using in a multithreaded environment. Am I going to lock the one single connection for all threads if I do And most importantly, given the API surface of Faraday, is there a guarantee that other libraries that I use in my code, which use Faraday, will be abiding by the same calling conventions and locking semantics? From what I understand through a cursory inspection, there isn't really a semantic definition of how long-lived Faraday objects - such as connection adapters and middleware - get shared between threads. Therefore I would expect Faraday to either leave cross-thread sharing to the hosting application, or to provide a known API for setting up and configuring connection pools for objects that are supposed to be shared. Now, to the issue that @derSascha has. I agree that the setup we have ("share nothing") is not ideal - it has been made to ensure maximum isolation of separate requests. I still think that when used in an application where threading is handled "by the authorities" (Sidekiq worker threads, Rails connection pooling etc.) most people simply would not understand why, say, all their Puma threads are suddenly performing all of their HTTP requests "in sequence" and wait on each other. It will be very hard to diagnose, because it will happen only when there are multiple simultaneous jobs/requests going on. Given that, maybe we should split the Patron connection adapter into the one we have right now, and the one you can give a
@iMacTia is there something I'm missing regarding Faraday's threading semantics? |
For example, something as simple as this can ensure the threading is safe: class PooledAdapter < Faraday::Adapter
def initialize(pool)
@pool = pool
end
def call(env)
super
@pool.with do |faraday_adapter|
faraday_adapter.call(env)
end
end
end |
Thanks @julik for the long and detailed explanation.
If I got it right, then I agree the last solution is definitely the best one, and I like the idea of extending it to other adapters as well (something similar was raised for the net_http_persistent adapter in lostisland/faraday#793, so that might have thread-safety issues as well). @technoweenie @olleolleolle do you guys have any insights/opinions on how the connection pooling should be factored into Faraday? I was thinking we might add it to every adapter (unless there's some incompatibility) and allow the user to customise the pool size by using adapter options: conn = Faraday.new(...) do |f|
...
f.adapter :patron, pool_size: 200
end If we do that, does the |
Good summary! Note also that keepalive will be attempted only if you do requests to the same hostname, same port and using the same protocol. So depending on the workload you might actually not want to have it, or it might have interesting performance implications. For example, imagine your application dispatches webhooks to hosts that users specify. In this case, after measuring the "bulks" of hostnames, you find out that even the most popular host that your users send callbacks to (say, Zapier or IFTTT) only accounts for 10% of the requests. It means that on average a very small subset of requests are going to be able to benefit from keepalive. Consider another scenario. You have a storage manager which retrieves data from S3. You only work within one AWS region, but you are also dispatching calls to a different service, which you run in the same region and which sits behind a load balancer. You notice that about half of requests go to that service, and the rest to one S3 bucket. In that case, you will benefit from keepalive the most if you allocate separate connection pools for each of these - one for calls to your other service, and one to your S3 bucket. On each of these pools you will have a near-guarantee of keepalive. So depending on the workload keepalive might be less effective than one would think it should be, even though there is nothing wrong with it - you just "switch" the CURL handle to talk to a different host/port, thereby resetting the connection. |
Thanks @julik. Again, if my understanding is correct then the issue with reusability should be solved by creating a different Faraday connection for each service you want to call, which is the recommended way of using Faraday anyway. That means a different stack, adapter and consequently connection pool for each service. |
@julik and @iMacTia, thanks for the explanations. The Mutex seems to be not the best way here and, as already explained by julik, it blocks all other requests. Is faraday intended to be thread-safe or is this something the user has to solve? My app heavily uses elasticsearch and always connects to the same endpoint. Its clear that connecting to different hosts can't work with keep-alive. Another simple solution: Keep the patron adapter how it is at the moment (thread-safe) and add a new one like |
Maybe we can add connection pool support via an adapter option? If the connection pool is given it will use one, if not it will create and teardown the session in-place. The connection pool API is really tiny (the |
Something I am wondering as well. The point is not only how your code works with Faraday but how do you make 3rd party code use your Faraday configuration correctly, regardless of the concurrency model. |
Re. previous suggestion
I would say that passing an existing ConnectionPool object is a better idea - first you don't need a dependency on the connection_pool gem, and second the connection pools need configuration (like timeouts). So more something like
|
@julik Something like this?
|
Yes, or even
I would call it
avoiding an extra conditional. |
@julik @derSascha after discussing internally with the core team, we decided to give the connection pool a go. The decision that came out was to make Faraday using a connection_pool internally, so you don't need to do any extra configuration, it will just work. I'm experimenting with that at the moment. In its current shape, connection pool options would be provided instead of a whole connection pool object: that should allow you to set things like size and timeout, together with any other option that the |
@iMacTia any updates on this? Test server:
Test client:
Results in
Seems to be still broken... |
@derSascha If you do this with raw Patron do you observe the same result? You need to use the same |
@julik if you change this line to something like |
I'm sorry @derSascha, I started the work to introduce the connection pool in lostisland/faraday#1006 but never got to the end of it because at the moment all adapters live in Faraday and it's quite a pain to make it work for all of them. |
Basic Info
Issue description
The pull-request lostisland/faraday#796 removed the cached patron session to avoid issues with multi-threading (patron sessions are not thread-safe). To use keep-alive, the session has to be reused and this is now impossible... Can we revert the changes introduced in lostisland/faraday#796 and use e.g. a Mutex to made it thread-safe in faraday?
Steps to reproduce
Test Server:
Test Client:
Output:
The first and second requests reuse a session and using keep-alive on the same connection. The third request use a new session, creates a new connection and does not use keep-alive.
The text was updated successfully, but these errors were encountered: