-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuring Serve control loop interval, add related docs #45063
Allow configuring Serve control loop interval, add related docs #45063
Conversation
Signed-off-by: Josh Karpel <[email protected]>
Signed-off-by: Josh Karpel <[email protected]> Signed-off-by: Josh Karpel <[email protected]>
@@ -56,3 +56,13 @@ proper backpressure. You can increase the value in the deployment decorator; e.g | |||
By default, Serve lets client HTTP requests run to completion no matter how long they take. However, slow requests could bottleneck the replica processing, blocking other requests that are waiting. It's recommended that you set an end-to-end timeout, so slow requests can be terminated and retried. | |||
|
|||
You can set an end-to-end timeout for HTTP requests by setting the `request_timeout_s` in the `http_options` field of the Serve config. HTTP Proxies will wait for that many seconds before terminating an HTTP request. This config is global to your Ray cluster, and it cannot be updated during runtime. Use [client-side retries](serve-best-practices-http-requests) to retry requests that time out due to transient failures. | |||
|
|||
### Give the Serve Controller more time to process requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took the liberty of adding a section here in case others run into the same issue. Please feel free to reword as desired, not sure what level of detail you want here :)
# How often to call the control loop on the controller. | ||
CONTROL_LOOP_PERIOD_S = 0.1 | ||
# How long to sleep between control loop cycles on the controller. | ||
CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought INTERVAL
made more sense than PERIOD
as the name, since it's the time between cycles, not a target for when the next cycle starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
way better :)
@@ -130,7 +130,7 @@ def replica_queue_length_autoscaling_policy( | |||
|
|||
# Only actually scale the replicas if we've made this decision for | |||
# 'scale_up_consecutive_periods' in a row. | |||
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S): | |||
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the interval is used in a few other places to count control loop cycles - am I breaking some assumption by allowing it to be configurable to some larger value (e.g., does this still make sense if the loop interval is large)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so -- but @zcin should confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this breaks any assumptions, if upscale delay < control loop interval, then the intervals between cycles that the controller sleeps for already inherently "covers" the required delay, so this code still makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @zcin chiming in on the autoscaling question
@@ -130,7 +130,7 @@ def replica_queue_length_autoscaling_policy( | |||
|
|||
# Only actually scale the replicas if we've made this decision for | |||
# 'scale_up_consecutive_periods' in a row. | |||
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S): | |||
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so -- but @zcin should confirm
# How often to call the control loop on the controller. | ||
CONTROL_LOOP_PERIOD_S = 0.1 | ||
# How long to sleep between control loop cycles on the controller. | ||
CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
way better :)
Thanks for the quick reviews! Much appreciated! |
Why are these changes needed?
In our experiments, adjusting this value upward helps the Serve Controller keep up with a large number of autoscaling metrics pushes from a large number of
DeploymentHandle
s (because the loop body is blocking, so increasing the interval lets more other code when the control loop isn't running), at the cost of control loop responsiveness (since it doesn't run as often).Related issue number
Closes #44784 ... for now!
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.