[Serve] Improve scalability of Serve `DeploymentHandle`s #44784

JoshKarpel · 2024-04-16T21:18:03Z

What happened + What you expected to happen

More context: https://ray-distributed.slack.com/archives/CNCKBBRJL/p1713194071772759

In a previous issue I described our use of Ray Serve to created dynamic applications/deployments #44226 . Well, we’re hosting a lot of models, and we just ran into

ray/python/ray/serve/_private/client.py

Lines 456 to 464 in 9cb1dc9

    
           if cache_key in self._evicted_handle_keys: 
        
               logger.warning( 
        
                   "You just got a ServeHandle that was evicted from internal " 
        
                   "cache. This means you are getting too many ServeHandles in " 
        
                   "the same process, this will bring down Serve's performance. " 
        
                   "Please post a github issue at " 
        
                   "https://github.com/ray-project/ray/issues to let the Serve " 
        
                   "team to find workaround for your use case." 
        
               )

, which says to make an issue :) That was added in #18980 / #19162

So we’re doing pretty much exactly what this warns against: getting lots of handles in our ingress application, order one handle per deployed model per ingress replica, which right now is something like ~500 models * 10 ingress replicas. The ingress application routes requests to the model applications via those handles.

I see that the MAX_CACHED_HANDLES is only 100, so we’re definitely blowing past that in each replica

ray/python/ray/serve/_private/constants.py

Line 90 in 9cb1dc9

MAX_CACHED_HANDLES = 100

We’re also consuming a significant chunk of the CONTROLLER_MAX_CONCURRENCY of 15000, which I assume means that if we exceed 15k handles they’ll suddenly stop working https://github.com/ray-project/ray/blob/9cb1dc9e682a087a32f47838fa02ca35f9b1b6ba/python/ray/serve/_private/constants.py#L94C1-L94C27

What we actually observed is that serve.get_app_handle in our ingress application got really slow. Seems like the Serve controller was too busy to respond to the two .remote calls that get_app_handle makes to the controller? (See #44782 for some discussion around making those calls async.)

In the short term, we’re looking at creating DeplymentHandles manually (without going through get_app_handle), because we know the application and deployment name to target already and don’t need to ask the controller anything. That resolves the initial latency of getting the handles, but doesn't fix the problem of the controller getting bogged down with all these tasks that scale with the number of handles (listen_for_change and record_handle_metrics). The concurrency limits in the Serve Controller will also put a hard block on our ability to scale the number of dynamic apps/deployments we're hosting.

What we expected to happen is that it shouldn't matter how many handles we make - that was wrong because the handles need some state from the controller to do their scheduling! But hopefully it can scale more efficiently than it does right now.

Versions / Dependencies

Ray 2.9.3, though it looks like this didn't change in Ray 2.10.x
Python 3.10.x

Reproduction script

Working on this, but TLDR: make a lot of apps/deployments (>100), make a lot (>100) of handles to them in the same process, and observe the load on the Serve Controller.

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

JoshKarpel · 2024-04-19T19:36:40Z

Some further thoughts on this issue as we've continued to investigate on our end:

It seems like the root problem here is that DeploymentHandles start up various background tasks (LongPollClient) and even full background threads (the MetricsPusher https://github.com/ray-project/ray/blob/releases/2.9.1/python/ray/serve/_private/router.py#L1018) to communicate with the Serve Controller. This seems to be causing two problems when there's a large number of DeploymentHandles:

The Serve Controller gets very busy
The Python process where the DeploymentHandles are also has a lot of background work to do, and can appear to freeze when the background threads are running.

Perhaps these costs can be amortized by sending/receiving these updates per-process instead of per-handle? I'm imagining something like each DeploymentHandle registering itself with some background global object that has a single LongPollClient and a single MetricsPusher (and any other per-handle background tasks). It would collect/distribute/etc. for all the handles together so that, while the amount of work is perhaps roughly the same, it's grouped into larger chunks, which will hopefully make it more efficient.

Thoughts?

JoshKarpel · 2024-04-20T04:20:54Z

Oh, important note on the above that I should highlight: because the Serve HTTP proxy is also using DeploymentHandles, we think it will run into the same issues if someone tried to deploy a large number of applications through the static Serve config as well, and in fact the proxy also bypasses the cache in ServeControllerClient.get_handle when it pre-allocates handles for every ingress deployment https://github.com/ray-project/ray/blob/master/python/ray/serve/_private/proxy_router.py#L66-L73 . So I don't want to imply that the problem is limited only to our use case with dynamic apps - I suspect that anyone trying to deploy large numbers of separate Serve apps on the same cluster will run into this scaling problem.

JoshKarpel · 2024-04-22T15:05:39Z

Oh! It looks like some work was already done recently on the MetricsPusher recently #43125 , but we just aren't on that version yet. We'll report back in once we upgrade.

anyscalesam · 2024-04-24T20:25:14Z

thank you for continuing to update progress on this @JoshKarpel > let us know when you get up to ray211 and let us know if the problem still persists cc @edoakes

JoshKarpel · 2024-04-30T20:51:43Z

@anyscalesam I've been chatting with @edoakes on Slack, I think #45063 will be sufficient to unblock us for now!

JoshKarpel · 2024-04-30T21:36:03Z

Another potential long-term improvement that I batted around on Slack was to separate the Serve Controller into multiple actors, each with fewer responsibilities - perhaps one to gather autoscaling metrics, one to push replica updates to handles, and one to host the actual control loop that reconciles Serve config to deployments and uses autoscaling metrics to make decisions.

## Why are these changes needed? In our experiments, adjusting this value upward helps the Serve Controller keep up with a large number of autoscaling metrics pushes from a large number of `DeploymentHandle`s (because the loop body is blocking, so increasing the interval lets more other code when the control loop isn't running), at the cost of control loop responsiveness (since it doesn't run as often). ## Related issue number  Closes #44784 ... for now! ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( Signed-off-by: Josh Karpel <[email protected]>

JoshKarpel · 2024-04-30T22:53:25Z

The other thought was around letting the Serve Controller apply more backpressure to the metrics pushers when it is overloaded so that they don't stack up indefinitely

JoshKarpel added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 16, 2024

JoshKarpel changed the title ~~[Serve]~~ [Serve] Improve efficiency of Serve DeploymentHandles Apr 16, 2024

JoshKarpel changed the title ~~[Serve] Improve efficiency of Serve DeploymentHandles~~ [Serve] Improve scalability of Serve DeploymentHandles Apr 16, 2024

anyscalesam added the serve Ray Serve Related Issue label Apr 18, 2024

edoakes added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks and removed bug Something that is supposed to be working; but isn't labels Apr 18, 2024

anyscalesam added @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 24, 2024

JoshKarpel mentioned this issue Apr 30, 2024

Allow configuring Serve control loop interval, add related docs #45063

Merged

8 tasks

edoakes closed this as completed in #45063 Apr 30, 2024

This was referenced Jun 6, 2024

[Serve] Provide backpressure on handle metrics push #45776

Open

[Serve] Amortize handle metrics pushing by grouping metrics by process #45777

Open

This was referenced Jun 7, 2024

[Serve] Optimize the get_deployment_statuses function #45792

Closed

[Serve] Optimize the _get_live_deployments function #45793

Open

[Serve] Optimize the _get_live_deployments function #45796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Improve scalability of Serve `DeploymentHandle`s #44784

[Serve] Improve scalability of Serve `DeploymentHandle`s #44784

JoshKarpel commented Apr 16, 2024

JoshKarpel commented Apr 19, 2024

JoshKarpel commented Apr 20, 2024

JoshKarpel commented Apr 22, 2024

anyscalesam commented Apr 24, 2024

JoshKarpel commented Apr 30, 2024 •

edited

Loading

JoshKarpel commented Apr 30, 2024

JoshKarpel commented Apr 30, 2024

[Serve] Improve scalability of Serve DeploymentHandles #44784

[Serve] Improve scalability of Serve DeploymentHandles #44784

Comments

JoshKarpel commented Apr 16, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

JoshKarpel commented Apr 19, 2024

JoshKarpel commented Apr 20, 2024

JoshKarpel commented Apr 22, 2024

anyscalesam commented Apr 24, 2024

JoshKarpel commented Apr 30, 2024 • edited Loading

JoshKarpel commented Apr 30, 2024

JoshKarpel commented Apr 30, 2024

[Serve] Improve scalability of Serve `DeploymentHandle`s #44784

[Serve] Improve scalability of Serve `DeploymentHandle`s #44784

JoshKarpel commented Apr 30, 2024 •

edited

Loading