http: A way to limit number of concurrent requests per endpoint #9213

euroelessar · 2019-12-03T23:47:51Z

We have gRPC (HTTP/2) and HTTP/1 python setups in production where single sidecar load balances across many local python processes. Each python process is bound to dedicated port.

What is a potential way to make envoy send at most one concurrent request to each of this python processes?
It would be ideal to not limit envoy to single worker thread (to allow scalability for outgoing requests, it's possible that single thread will not be able to handle them).

mattklein123 · 2019-12-04T04:08:04Z

@euroelessar per conversation IRL, I'm not positive there is an OOB way of doing this today.

I think we can likely do this with active/pending request circuit breakers (1 cluster per port), but I think the connection pools might need a bit of modification (in slightly different ways for HTTP/1 and HTTP/2). I also take it that you want a shared single request across both H1 and H2? This is also related to #9215 which I was just discussing with @tonya11en today.

Obviously this could be also done with a custom filter and in-filter queuing if you want to go that route.

euroelessar · 2019-12-04T04:45:31Z

@mattklein123

I think we can likely do this with active/pending request circuit breakers

Circuit breakers are based on atomic gauges, so they are best effort and potentially can send two concurrent requests to single endpoint if envoy has more than single worker thread.

I also take it that you want a shared single request across both H1 and H2?

Each setup is either H1 or H2, we don't mix different protocols within single sidecar.

Obviously this could be also done with a custom filter and in-filter queuing if you want to go that route.

It looks like it may work, assuming each endpoint is an individual cluster. Effectively all LB logic will be in custom http stream filter. One caveat though is that every route action has to be aware about all the enumeration of clusters.

As an alternative suggestion - is it possible to make Envoy load balancer implementations to track state? Currently they operate based on the stats, is it feasible to allow them to have callback for "request ended" for each "Pick" call?
In this case this problem can be solved by slightly more complicated version of thread-safe "least-requests" load balancer, which would also fast-fail if all alive endpoints already have at least one active request.

tonya11en · 2019-12-04T07:53:22Z

To track state in the Envoy load balancer implementations, you'd need to perform a conditional increment of an integer representing the number of outstanding requests. In your case of only allowing a single concurrent request, something like:

if (num_outstanding < 1) {
 ++num_outstanding;
 forwardRequest();
}

If that num_outstanding integer is shared across all the worker threads and they all need to have the same world-view (w.r.t. that integer), you won't be able to use an atomic int. There would be a race between the load() for the comparison and the atomic increment. I'm pretty sure you're forced into using a CAS loop or protecting the variable with a mutex for thread-safe tracking of state, which is a hard sell given that it clashes with Envoy's eventually consistent approach to threading. It's possible, but I'm not sure it's worth the scalability sacrifice for the rest of the community (I'll let the maintainers weigh in on that though).

I think a custom filter is the only approach here if you absolutely must have strict consistency.

mattklein123 · 2019-12-04T21:57:52Z

It looks like it may work, assuming each endpoint is an individual cluster. Effectively all LB logic will be in custom http stream filter. One caveat though is that every route action has to be aware about all the enumeration of clusters.

Yeah you would just lookup the cluster in the filter and then keep track of the number of upstream requests going there, and deal with holding, overflow, etc. in the filter. I don't think it would be that hard to implement. I do think that there might be use for a general queuing filter, but it would need more thinking.

In this case this problem can be solved by slightly more complicated version of thread-safe "least-requests" load balancer, which would also fast-fail if all alive endpoints already have at least one active request.

There is a proposal to allow adding custom load balancers. See #5598. We already allow custom load balancers via custom clusters, so I don't think this would be very hard to implement if you want to go this route.

cc @snowp for additional thoughts.

snowp · 2019-12-07T17:56:49Z

I haven't given this a ton of thought, but I imagine we can do something like this to improve the consistency as we're doing an atomic write + read:

if (++resource > max) {
  // we went over, so decrement and fail
  resource--;
  return ...;
}

If that solves the consistency issue, then either doing one endpoint per cluster or introducing per endpoint circuit breakers might be the way to achieve 1 active request per endpoint.

euroelessar · 2019-12-17T00:40:13Z

If that num_outstanding integer is shared across all the worker threads and they all need to have the same world-view (w.r.t. that integer), you won't be able to use an atomic int. There would be a race between the load() for the comparison and the atomic increment. I'm pretty sure you're forced into using a CAS loop or protecting the variable with a mutex for thread-safe tracking of state, which is a hard sell given that it clashes with Envoy's eventually consistent approach to threading. It's possible, but I'm not sure it's worth the scalability sacrifice for the rest of the community (I'll let the maintainers weigh in on that though).

I think a custom filter is the only approach here if you absolutely must have strict consistency.

It's acceptable for us to have our own lb extension which would pay the contention cost, there is no need to add this burden to widely-used load balancing policies.

Yeah you would just lookup the cluster in the filter and then keep track of the number of upstream requests going there, and deal with holding, overflow, etc. in the filter. I don't think it would be that hard to implement. I do think that there might be use for a general queuing filter, but it would need more thinking.

Will check how hard is it, on the first glance it has a side effect of having separate per-endpoint stats though.
What interface can filter use to specify what cluster to use? The only option I see is to set custom metadata/header and explicitly list all available clusters in every route (if sidecar is route-aware).

mattklein123 · 2019-12-19T16:36:05Z

It's acceptable for us to have our own lb extension which would pay the contention cost, there is no need to add this burden to widely-used load balancing policies.

There is a bunch of discussion now around custom LB extensions (vs. the LB/cluster pair we have now). See #5598. cc @snowp who has been thinking about this a bunch.

Will check how hard is it, on the first glance it has a side effect of having separate per-endpoint stats though.

See a cluster's alt_stat_name which would allow you to have unified stats for most of the stats you care about (probably). This might be an OK compromise.

What interface can filter use to specify what cluster to use? The only option I see is to set custom metadata/header and explicitly list all available clusters in every route (if sidecar is route-aware).

A hacky solution would be to set a header, etc. and then refresh the route cache to force the request to go to the cluster you care about. It's possible you could achieve something similar with the subset LB but I don't have any experience with that. A better solution would be to allow a filter to actually provide the route that that the request should use during subsequent processing. This has been discussed several times.

stale · 2020-01-19T03:38:02Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale · 2020-01-26T05:58:04Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

mattklein123 added question Questions that are neither investigations, bugs, nor enhancements design proposal Needs design doc/proposal before implementation and removed question Questions that are neither investigations, bugs, nor enhancements labels Dec 4, 2019

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 19, 2020

stale bot closed this as completed Jan 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http: A way to limit number of concurrent requests per endpoint #9213

http: A way to limit number of concurrent requests per endpoint #9213

euroelessar commented Dec 3, 2019

mattklein123 commented Dec 4, 2019

euroelessar commented Dec 4, 2019

tonya11en commented Dec 4, 2019

mattklein123 commented Dec 4, 2019

snowp commented Dec 7, 2019

euroelessar commented Dec 17, 2019

mattklein123 commented Dec 19, 2019

stale bot commented Jan 19, 2020

stale bot commented Jan 26, 2020

http: A way to limit number of concurrent requests per endpoint #9213

http: A way to limit number of concurrent requests per endpoint #9213

Comments

euroelessar commented Dec 3, 2019

mattklein123 commented Dec 4, 2019

euroelessar commented Dec 4, 2019

tonya11en commented Dec 4, 2019

mattklein123 commented Dec 4, 2019

snowp commented Dec 7, 2019

euroelessar commented Dec 17, 2019

mattklein123 commented Dec 19, 2019

stale bot commented Jan 19, 2020

stale bot commented Jan 26, 2020