-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Improve scalability of Serve DeploymentHandle
s
#44784
Comments
DeploymentHandle
s
DeploymentHandle
sDeploymentHandle
s
Some further thoughts on this issue as we've continued to investigate on our end: It seems like the root problem here is that
Perhaps these costs can be amortized by sending/receiving these updates per-process instead of per-handle? I'm imagining something like each Thoughts? |
Oh, important note on the above that I should highlight: because the Serve HTTP proxy is also using |
Oh! It looks like some work was already done recently on the |
thank you for continuing to update progress on this @JoshKarpel > let us know when you get up to ray211 and let us know if the problem still persists cc @edoakes |
@anyscalesam I've been chatting with @edoakes on Slack, I think #45063 will be sufficient to unblock us for now! |
Another potential long-term improvement that I batted around on Slack was to separate the Serve Controller into multiple actors, each with fewer responsibilities - perhaps one to gather autoscaling metrics, one to push replica updates to handles, and one to host the actual control loop that reconciles Serve config to deployments and uses autoscaling metrics to make decisions. |
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? In our experiments, adjusting this value upward helps the Serve Controller keep up with a large number of autoscaling metrics pushes from a large number of `DeploymentHandle`s (because the loop body is blocking, so increasing the interval lets more other code when the control loop isn't running), at the cost of control loop responsiveness (since it doesn't run as often). ## Related issue number <!-- For example: "Closes #1234" --> Closes #44784 ... for now! ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( Signed-off-by: Josh Karpel <[email protected]>
The other thought was around letting the Serve Controller apply more backpressure to the metrics pushers when it is overloaded so that they don't stack up indefinitely |
What happened + What you expected to happen
More context: https://ray-distributed.slack.com/archives/CNCKBBRJL/p1713194071772759
In a previous issue I described our use of Ray Serve to created dynamic applications/deployments #44226 . Well, we’re hosting a lot of models, and we just ran into
ray/python/ray/serve/_private/client.py
Lines 456 to 464 in 9cb1dc9
So we’re doing pretty much exactly what this warns against: getting lots of handles in our ingress application, order one handle per deployed model per ingress replica, which right now is something like ~500 models * 10 ingress replicas. The ingress application routes requests to the model applications via those handles.
I see that the
MAX_CACHED_HANDLES
is only100
, so we’re definitely blowing past that in each replicaray/python/ray/serve/_private/constants.py
Line 90 in 9cb1dc9
We’re also consuming a significant chunk of the
CONTROLLER_MAX_CONCURRENCY
of15000
, which I assume means that if we exceed 15k handles they’ll suddenly stop working https://github.com/ray-project/ray/blob/9cb1dc9e682a087a32f47838fa02ca35f9b1b6ba/python/ray/serve/_private/constants.py#L94C1-L94C27What we actually observed is that
serve.get_app_handle
in our ingress application got really slow. Seems like the Serve controller was too busy to respond to the two.remote
calls thatget_app_handle
makes to the controller? (See #44782 for some discussion around making those callsasync
.)In the short term, we’re looking at creating
DeplymentHandle
s manually (without going throughget_app_handle
), because we know the application and deployment name to target already and don’t need to ask the controller anything. That resolves the initial latency of getting the handles, but doesn't fix the problem of the controller getting bogged down with all these tasks that scale with the number of handles (listen_for_change
andrecord_handle_metrics
). The concurrency limits in the Serve Controller will also put a hard block on our ability to scale the number of dynamic apps/deployments we're hosting.What we expected to happen is that it shouldn't matter how many handles we make - that was wrong because the handles need some state from the controller to do their scheduling! But hopefully it can scale more efficiently than it does right now.
Versions / Dependencies
Ray 2.9.3, though it looks like this didn't change in Ray 2.10.x
Python 3.10.x
Reproduction script
Working on this, but TLDR: make a lot of apps/deployments (>100), make a lot (>100) of handles to them in the same process, and observe the load on the Serve Controller.
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: