-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't double account for requests going through the activator. #3301
Comments
Regarding the attack vector in Proposal 2. As it seems it is possible to sanitise headers on the latest master of Istio - istio/istio#8340 (I haven't tested it myself through). So we could restrict users from using certain system headers by pruning them if they are present. Maybe it is something we could benefit from after migrating to Istio 1.1? |
Maybe, we shouldn't wait for Istio 1.1 with this though. |
Talked to Kenny and he's going to take this on. /assign @k4leung4 /milestone "Serving 0.5" |
@markusthoemmes: You must be a member of the knative/knative-milestone-maintainers github team to set the milestone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone "Serving 0.5" |
@mattmoor: The provided milestone is not valid for this repository. Milestones in this repository: [ Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone Serving 0.5 |
/assign @hohaichi As I won't be able to work on it any more. |
@k4leung4: GitHub didn't allow me to assign the following users: hohaichi. Note that only knative members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
Proposal 3: Let the autoscaler decide
Intuitively, this means autoscaler does not count a pending request in queue-proxy if it is already counted in activator. It is straightforward in the cold start periods, where all requests go through activator, and in the steady periods, where no requests go through activator. The grey area is when the revision is just up and receiving requests directly, yet there are still pending requests in activator. In that case, we may discount a request x that comes directly to queue-proxy by a request y that is pending at activator but not at queue-proxy. When we compute the total concurrency Q(t) + A(t), y is counted (in A(t)) and x is not counted. However, we can turn it around, considering x is counted and y is not counted, as the total is the same. With respect to the revision, this is correct, because x affects the revision, while y does not. What we may lose is only a look-ahead for requests that reached activator and are going to reach the revision. Taking a closer look, y is pending at activator but not at queue-proxy can mean only two cases: y has not arrived at queue-proxy, and y has finished at queue-proxy. If y has not arrived at queue-proxy, it will likely arrive at queue-proxy in the next window. With the current math, queue-proxy has 6 data points in the total concurrency, and not counting y in one data point loses only 1/6 count. If y has finished at queue-proxy, well, not counting y is actually more accurate w.r.t. the load on the revision currently and in the near future. Compare to proposals 1 and 2: Proposal 1 can drop requests during cold start, when they are in-flight between activator and queue-proxy. Proposal 3 can drop in-flight requests only when the revision is up and running, and an in-flight request is dropped only to count a request coming directly to the queue-proxy. Not counting requests during cold start is bad, because in-flight requests from activator may be the only requests coming to the revision. Not counting in-flight requests from activator when the revision is already handling traffic directly does not matter much, because no new traffic goes to activator. Proposal 2 can arguably have the attack mentioned above, proposal 3 doesn't. In addition, proposal 2 introduces a binding between activator and queue-proxy (e.g., queue-proxy needs to understand the header activator uses), while proposal 3 simply makes local calculation. @mattmoor @markusthoemmes what do you think? |
I talked to Matt. So we want to have queue-proxy report metrics for requests going through activator and for requests not going through activator separately. Then autoscaler can decide to use which metric(s). For this particular issue, autoscaler will pick only the metric for requests not going through activator and get a result similar to proposal 2. |
Thanks for the very detailed writeup @hohaichi A few thoughts:
These are very important properties I think. I don't think the second sentence is true (depending on what "window" means here). I read it as "metric collection window", which is 1s in our case (as opposed to the aggregation window of 60s/6s respectively). As the activator now throttles requests based on the number of pods available it can actually take quite a while for this queue to drain. It also does not directly relate to the concurrency that's reported by the queue-proxies. It's entirely possible to have a backlog of 1000 requests on the activator, only 10 of them being actively proxied at the moment. Discounting the whole backlog from the metrics reported by the queue-proxies will skew things heavily I fear. @mattmoor's idea seems to be exactly the spirit of 2., no? Just a question of which component discounts the metric I guess. I'm trying to be wary of the complexity of our metrics pipeline. I like about the current reporting that it just sums up to a "concurrency in system" metric. Once we start splitting metrics or start double reporting we lose that "feature". Can you clarify a bit further what the metrics would look like after the change? |
Thank you for your thoughtful feedback, @markusthoemmes The backlog case you pointed out is not as bad as you fear, if I understand what you mean correctly. When we add the concurrency from activator and the adjusted concurrency from queue-proxy, the total becomes the max of the two. So there is theoretically a chance that backlog never drains, when the number of requests directly to queue-proxy balance out the backlog. But in practice, we don't have the precision translation from concurrency to compute resources, nor the granularity to provision the compute resources to the exact concurrency. There should always been headroom on the compute resource side (or we have a bigger problem to address), and the headroom is what drains the backlog. By the way, I think there is a chance of backlog never drains with any scaling algorithm based on concurrency, because requests are not the same and scaling based solely on concurrency implicitly assumes that requests are the same. Anyway, the new behavior should be: the pods are scaled up and the backlog drains quickly until new requests come directly to the revision. Then the backlog drains slowly, and the pods may not scale up to keep up with the drain rate before. After the backlog completely drains, the pods continue to scale up with the traffic. So the change in behavior is (1) the drain rate can be slower during the scale-up period, and (2) no scale-down after the drain finishes. Is it good or bad? I don't know. I think it depends on the apps.
I think the effect is the same as 2, but the approach is very different. Proposal 2 have queue-proxy to report not what it measures. It needs the knowledge from what autoscaler does. It's cleaner to have concurrency consistently means "number of pending requests" everywhere, and have the autoscaling logic contained in the autoscaler. This is part of the reasons I proposed proposal 3, to let the autoscaler decide.
|
I drafted PR#3477. I am still working on tests, but I'd like to get early feedback on the approach, the choices of header and metric, etc. @markusthoemmes @mattmoor @tcnghia your inputs are appreciated. |
@hohaichi I think I see what you mean. There is a strong correlation between the amount of workload that can be in-flight at a time (essentially defined by the amount of pods) and the drain-rate from the activator (also basically defined by the amount of pods). I can see that not being as bad as I imagined it to be. Maybe we can a bit more nuanced? We could report 2 numbers from the Activator
The queue-proxies would still only report the concurrency ( That'd slightly alter the calculation to Coming full-circle to proposal 1: We have ditched/are ditching the retries in |
@markusthoemmes I think a backlog metric makes a lot of sense. In fact, I think concurrency should be refined to two metrics: backlog, and pending (being processed). Pending gives us an idea of pods' throughput, and we can dynamically adjust target concurrency. Backlog can help to troubleshoot/scale Knative components. I also think that backlog should include not only requests in the activator, but also requests in queue before they are forwarded to the user container. But then it gets us back to not double counting requests in backlog. :) By the way, do you know why we throttle at activator? Should we let the requests buffered in queue, and queue throttles requests it sends to the user container? Activator is shared and I feel that its not draining as fast as it can could result in one app hurting another. |
The reason we buffer in activator is: #1409 |
@markusthoemmes Thank you for the reference. Do you know if there's any reason other latency? It seems not a good trade-off, because we sacrifice reliability for latency. Furthermore, it solves the case of 0-N, but does not solve the same problem in 1-N or x-N scaling. Proposal 2 in #1409 (comment) sounds better to me. |
In what area(s)?
What version of Knative?
HEAD
Description
Today, we have 2 components informing the autoscaler about the state of the system: activator and queue-proxy (per pod). With activator throttling enabled we can now have the situation where the activator and the queue-proxies simultaneously send valid concurrency data. The autoscaler needs some adjustments in its algorithms to account for that (see #3289) but we also need to make sure that we do not double account for requests.
If a request is proxied by the activator today, both the activator and the receiving queue-proxy will increase their respective concurrency counters to report that.
We discussed 2 ways of fixing that:
Proposal 1: Discount in the activator
We can discount a request in the activator as soon as it hits the
proxyRequest
method (i.e. is handed off to Golang's ReverseProxy). However, that method has internal retries and we cannot know when the request is really proxied. That can make for blurriness in the autoscaler signal.Proposal 2: Discount in the queue-proxy
The second proposal was to discount the request in the queue-proxy. The activator would set a header that will cause the queue-proxy to not take that request into consideration when accounting for it's observed concurrency. It's been pointed out that this is a potential attack vector as arbitrary users could set this header as well which would in turn render autoscaling of that revision inaccurate and could potentially choke the application.
The text was updated successfully, but these errors were encountered: