Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] fix lightweight update max ongoing requests #45006

Merged
merged 1 commit into from
Apr 29, 2024

Commits on Apr 27, 2024

  1. [serve] fix lightweight update max ongoing requests

    When a lightweight update occurs for a deployment and `max_ongoing_requests` is updated, two components need to be notified:
    1. Deployment handles, to know not to send more requests to a replica when it's reached its maximum
    2. Replicas, to know to reject requests when it's reached its maximum
    
    Right now we handle (1), but we don't handle (2), i.e. replicas aren't notified of the updated `max_ongoing_requests` for lightweight updates. The problem is that (1) is not strict enforcement of `max_ongoing_requests` since it relies on a cache that can be stale, so the current bug is that replicas aren't updated -> updated max is not fully enforced.
    
    This PR fixes that, and updates a test to fully test this behavior.
    
    Fixes ray-project#44975.
    
    
    Signed-off-by: Cindy Zhang <[email protected]>
    zcin committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    0316e71 View commit details
    Browse the repository at this point in the history