-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] performance bottlenecked by the ProxyActor #42565
Comments
@ShenJiahuan thanks for filing the issue! We're aware of the proxy as the bottleneck for each node. The reason for this design decision is because the proxy includes quite a bit of intelligence to handle dynamic service discovery/routing, features like model multiplexing, and handling edge cases like properly draining requests upon node removal (e.g., spot instance interruption). We've done some work to ensure performance is within a reasonable bound, but we're aware that this may be a dealbreaker for some use cases. Likely our chosen path to improve this bottleneck would be to optimize the proxy performance (reduce overhead from Ray actor calls, consider implementing it in a faster language like C++) rather than rearchitect the system to remove it. However, the 70 QPS number you've cited here is dramatically lower than we see in our microbenchmarks. I re-ran this benchmark on my laptop (2021 MacBook Pro with M1 max) on the master branch and saw ~800qps:
What instance type did you run the benchmark on? Are you able to re-run it on one that is publicly available such as an AWS instance type? |
@edoakes, thank you for the quick and detailed response! Upon reevaluating our setup, I discovered a misconfiguration on our end that constrained the performance. 🥲 After addressing this, I am now observing a throughput of approximately 280 QPS on our corrected setup. Furthermore, I conducted a test on an AWS t3.2xlarge instance (8-core Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz), which yielded around 175 QPS. While this is a marked improvement over the initial 70 QPS, it seems that we could still encounter scalability limitations under high-load scenarios. |
I meet this issue in my task too, and I re-run the script you provide, the throughput is approximately 220 QPS,can you tell me the misconfiguration adjust to improve the performance, thanks |
@edoakes Hi! I would like to know what factors might affect the performance level of ProxyActor, such as the configuration of the server or the complexity of the application. If there are multiple nodes, is the improvement linear? We are hoping to use RayServe in a production environment. Given our current observations, this performance issue is very important to us. Do you know when there might be significant progress on this? |
@xjhust, a throughput of 220 QPS appears to be reasonable and does not necessarily indicate a misconfiguration, especially when compared with my subsequent experiments. The performance is closely tied to the single-core performance of the CPU. The 800 QPS achieved by @edoakes is likely due to the superior single-core performance of the M1 Max. In fact, our server's CPU was previously set to 'powersave' mode, which resulted in a reduced performance of approximately 70 QPS. |
Yes this makes sense given that the proxy is currently a Python process which is largely single-threaded. |
The overall throughput scaling is ~linear as you add more nodes to the cluster as each node will have its own proxy (and they scale independently). We are making incremental progress to this all the time (example), but I wouldn't expect orders-of-magnitude improvement in the near future (i.e., the next few releases). Most of our users' workloads involve fairly heavyweight ML inference requests, so supporting very high QPS per node is not often the primary goal. So we are instead focusing our efforts on other issues like improved autoscaling, stability, and observability. |
Any updates? |
@decadance-dance given the limited information you have provided, I doubt that what you're seeing is the same issue described in this ticket. If you get in touch on the Ray Slack or file a separate issue and provide more details about your setup, we may be able to point out some improvements. |
Working on an investigation, will update here from our benchmark. |
I tested on a 16-core linux machine with i9-11900K, it might be a bit overpowered so I'm moved the test to a typical cloud baremetal 2.6GHZ. (See results in later part, both tests are based on 2.32.0)
Cloud Results: Cloud baremetal machine with 128 cores, CPU 2.6GHz: I get ~180 qps, latency 500ms-1000ms, which is pretty low. |
What happened + What you expected to happen
Issue Summary
We've identified a potential performance bottleneck within Ray Serve due to the limitation of having a single ProxyActor per node. This architecture may be hindering scalability and maximum request handling capacity.
Performance Test Details
To evaluate the performance implications, we conducted a test using the following setup:
The test results indicated that the system is currently capped at approximately 70 requests per second.
Source Code Review and Concerns
Upon reviewing the Ray source code, it has become evident that the design choice of a single ProxyActor per node is responsible for handling all incoming requests, replica selection, and other associated tasks. This centralized approach is likely what prevents Ray Serve from scaling effectively with the available hardware resources.
Proposed Discussion Points
I believe addressing this bottleneck could significantly improve Ray Serve's performance and scalability. I look forward to the community's input and potential solutions.
Versions / Dependencies
Ray 2.9.1
Reproduction script
Test Code Snippet
To reproduce the performance issue, please use the following code which sets up a Ray Serve deployment with 48 replicas:
Starting Ray Serve
Ray Serve is initiated with the following command:
Load Testing
For stressing the service and measuring the throughput, we are using ApacheBench with the following command:
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: