-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: A way to limit number of concurrent requests per endpoint #9213
Comments
@euroelessar per conversation IRL, I'm not positive there is an OOB way of doing this today. I think we can likely do this with active/pending request circuit breakers (1 cluster per port), but I think the connection pools might need a bit of modification (in slightly different ways for HTTP/1 and HTTP/2). I also take it that you want a shared single request across both H1 and H2? This is also related to #9215 which I was just discussing with @tonya11en today. Obviously this could be also done with a custom filter and in-filter queuing if you want to go that route. |
Circuit breakers are based on atomic gauges, so they are best effort and potentially can send two concurrent requests to single endpoint if envoy has more than single worker thread.
Each setup is either H1 or H2, we don't mix different protocols within single sidecar.
It looks like it may work, assuming each endpoint is an individual cluster. Effectively all LB logic will be in custom http stream filter. One caveat though is that every route action has to be aware about all the enumeration of clusters. As an alternative suggestion - is it possible to make Envoy load balancer implementations to track state? Currently they operate based on the stats, is it feasible to allow them to have callback for "request ended" for each "Pick" call? |
To track state in the Envoy load balancer implementations, you'd need to perform a conditional increment of an integer representing the number of outstanding requests. In your case of only allowing a single concurrent request, something like:
If that I think a custom filter is the only approach here if you absolutely must have strict consistency. |
Yeah you would just lookup the cluster in the filter and then keep track of the number of upstream requests going there, and deal with holding, overflow, etc. in the filter. I don't think it would be that hard to implement. I do think that there might be use for a general queuing filter, but it would need more thinking.
There is a proposal to allow adding custom load balancers. See #5598. We already allow custom load balancers via custom clusters, so I don't think this would be very hard to implement if you want to go this route. cc @snowp for additional thoughts. |
I haven't given this a ton of thought, but I imagine we can do something like this to improve the consistency as we're doing an atomic write + read:
If that solves the consistency issue, then either doing one endpoint per cluster or introducing per endpoint circuit breakers might be the way to achieve 1 active request per endpoint. |
It's acceptable for us to have our own lb extension which would pay the contention cost, there is no need to add this burden to widely-used load balancing policies.
Will check how hard is it, on the first glance it has a side effect of having separate per-endpoint stats though. |
There is a bunch of discussion now around custom LB extensions (vs. the LB/cluster pair we have now). See #5598. cc @snowp who has been thinking about this a bunch.
See a cluster's
A hacky solution would be to set a header, etc. and then refresh the route cache to force the request to go to the cluster you care about. It's possible you could achieve something similar with the subset LB but I don't have any experience with that. A better solution would be to allow a filter to actually provide the route that that the request should use during subsequent processing. This has been discussed several times. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
We have gRPC (HTTP/2) and HTTP/1 python setups in production where single sidecar load balances across many local python processes. Each python process is bound to dedicated port.
What is a potential way to make envoy send at most one concurrent request to each of this python processes?
It would be ideal to not limit envoy to single worker thread (to allow scalability for outgoing requests, it's possible that single thread will not be able to handle them).
The text was updated successfully, but these errors were encountered: