Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Optimize the _get_live_deployments function #45793

Open
jugalshah291 opened this issue Jun 7, 2024 · 0 comments
Open

[Serve] Optimize the _get_live_deployments function #45793

jugalshah291 opened this issue Jun 7, 2024 · 0 comments
Assignees
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@jugalshah291
Copy link

jugalshah291 commented Jun 7, 2024

Description

The _get_live_deployments function internally calls get_deployments_in_application that iterates over self._deployment_state ( that stores all the live deployments) and filters out deployment for a specific app. This is inefficient especially when you have a large of serve application running. (For every serve application we are iterating over all the deployments)

May be we can optimize it by defining a new dictionary (in place of self._deployment_state) that maps app name to the live deployments name. However I don't know much about the ray internals and their could be a better way around this ( we can rethink the class structures)

Use case

Our system is running a very large number of DeploymentHandles (see #44784 for more details). We've noticed that the Serve controller gets overloaded (>100% CPU usage) and a small percentage of this CPU usage is coming from the _get_live_deployments (get_deployments_in_application) function, which is called at multiple places.

image

Uploading controller_1200_kuberay_hack.svg…

@jugalshah291 jugalshah291 added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 7, 2024
@anyscalesam anyscalesam added the serve Ray Serve Related Issue label Jun 12, 2024
@zcin zcin added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

3 participants