-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Add a timeout parameter for scheduling ray tasks to replicas #40995
Comments
Chatted with @akshay-anyscale > this is a pretty good idea. @kyle-v6x would you be willing to contribute back here? |
@anyscalesam Sure! My only concern is that from the 2.10 release there is a new parameter which allows us to shed the load based on a maximum replica queue size. In practice, this queue-size based approach probably works for non time-sensitive jobs, but is quite awkward for anything coming from an HTTP request. It can achieve the same thing in practice two ways, but each has a huge issue:
Currently, we use the second option with Personally, I think it would be nice to add something like |
I don't quite understand why this is a concern for your feature request. As a side note, I would not recommend setting Adding an assignment timeout makes sense. If you're open to contributing it I'd be happy to point you in the right direction. |
@edoakes Thanks for clarifying. Got a little too tunnel-visoned on our own use-case, but I see how And yes totally agree regarding setting I'll be away for a few days but happy to work on a solution as soon as I return. Would you recommend adding it as an additional deployment parameter then? |
One more note. We tried the following pattern, but I haven't dug into whether the returned reference really means that the task has been assigned to a replica.
Now:
|
I was going to suggest that you try this pattern. There is one issue here though -- How about you start with that, see if it works for your use case, then we can discuss if/how to add first class support? |
@edoakes Got it. I'll start there when I return. Thanks for the input! |
Finally got around to do some testing. I tried the following:
The issues is that the |
Got some time to look into this today. Turns out the above strategy is broken for Digging into it more: Since 2.10, Python replicas always return In order to deal with the new Generator return, the Hopefully if I have time in the next few weeks I can build locally and get to the source of the issue. Removing the await from More notes: |
Finally a script to replicate the issue:
|
Aha. Looks like the source of the issues might be from the Line 511 in 7874da9
(cc. @stephanie-wang) |
Description
Currently, requests waiting to be scheduled to a replica will retry infinitely with a given backoff sequence. While we can set an HTTP timeout in the http settings, this can result in a cluster falling too far behind input requests and critically failing all requests until it scales appropriately.
It would be nice to set an additional timeout for each deployment which represents the maximum time a request can spend waiting to be sent to a deployment. We can then set this value to (HTTP_TIMEOUT - PROCESSING_TIME) and ensure that we are not wasting any time on requests that are going to timeout anyway.
I'm also curious if anyone has implimented this without a separate dispatcher deployment.
ray/python/ray/serve/_private/router.py
Line 290 in a6b6898
Use case
Machine learning inference server with processing time in the seconds. There is a failure mode where the load increases faster than the cold-start times can cope with, and the server get's stuck working on requests it never finishes.
The text was updated successfully, but these errors were encountered: