Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring Serve control loop interval, add related docs #45063

Conversation

JoshKarpel
Copy link
Contributor

@JoshKarpel JoshKarpel commented Apr 30, 2024

Why are these changes needed?

In our experiments, adjusting this value upward helps the Serve Controller keep up with a large number of autoscaling metrics pushes from a large number of DeploymentHandles (because the loop body is blocking, so increasing the interval lets more other code when the control loop isn't running), at the cost of control loop responsiveness (since it doesn't run as often).

Related issue number

Closes #44784 ... for now!

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -56,3 +56,13 @@ proper backpressure. You can increase the value in the deployment decorator; e.g
By default, Serve lets client HTTP requests run to completion no matter how long they take. However, slow requests could bottleneck the replica processing, blocking other requests that are waiting. It's recommended that you set an end-to-end timeout, so slow requests can be terminated and retried.

You can set an end-to-end timeout for HTTP requests by setting the `request_timeout_s` in the `http_options` field of the Serve config. HTTP Proxies will wait for that many seconds before terminating an HTTP request. This config is global to your Ray cluster, and it cannot be updated during runtime. Use [client-side retries](serve-best-practices-http-requests) to retry requests that time out due to transient failures.

### Give the Serve Controller more time to process requests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took the liberty of adding a section here in case others run into the same issue. Please feel free to reword as desired, not sure what level of detail you want here :)

# How often to call the control loop on the controller.
CONTROL_LOOP_PERIOD_S = 0.1
# How long to sleep between control loop cycles on the controller.
CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought INTERVAL made more sense than PERIOD as the name, since it's the time between cycles, not a target for when the next cycle starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way better :)

@@ -130,7 +130,7 @@ def replica_queue_length_autoscaling_policy(

# Only actually scale the replicas if we've made this decision for
# 'scale_up_consecutive_periods' in a row.
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S):
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the interval is used in a few other places to count control loop cycles - am I breaking some assumption by allowing it to be configurable to some larger value (e.g., does this still make sense if the loop interval is large)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so -- but @zcin should confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this breaks any assumptions, if upscale delay < control loop interval, then the intervals between cycles that the controller sleeps for already inherently "covers" the required delay, so this code still makes sense.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @zcin chiming in on the autoscaling question

@@ -130,7 +130,7 @@ def replica_queue_length_autoscaling_policy(

# Only actually scale the replicas if we've made this decision for
# 'scale_up_consecutive_periods' in a row.
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_PERIOD_S):
if decision_counter > int(config.upscale_delay_s / CONTROL_LOOP_INTERVAL_S):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so -- but @zcin should confirm

# How often to call the control loop on the controller.
CONTROL_LOOP_PERIOD_S = 0.1
# How long to sleep between control loop cycles on the controller.
CONTROL_LOOP_INTERVAL_S = float(os.getenv("RAY_SERVE_CONTROL_LOOP_INTERVAL_S", 0.1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way better :)

@edoakes edoakes merged commit 23d05bb into ray-project:master Apr 30, 2024
5 checks passed
@JoshKarpel JoshKarpel deleted the allow-configuring-serve-control-loop-interval branch April 30, 2024 22:53
@JoshKarpel
Copy link
Contributor Author

Thanks for the quick reviews! Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Improve scalability of Serve DeploymentHandles
3 participants