Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] proxy should ping replica immediately after receiving new actor handle #47036

Closed
zcin opened this issue Aug 8, 2024 · 0 comments · Fixed by #47053
Closed

[serve] proxy should ping replica immediately after receiving new actor handle #47036

zcin opened this issue Aug 8, 2024 · 0 comments · Fixed by #47053
Assignees
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@zcin
Copy link
Contributor

zcin commented Aug 8, 2024

When proxies get an updated set of running replicas, it receives the actor handles for them from the controller. However the actor handle itself is only a pointer that contains the actor ID, but doesn't actually contain all actor info such as the actor address. This info is populated by the GCS upon first request made through that actor handle.

If GCS goes down before the proxy gets to make that first request, requests will hang in the duration of GCS of failure. To improve fault tolerance, the router should ping the replica immediately after receiving a new actor handle to populate the actor info.

@zcin zcin added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue labels Aug 8, 2024
@zcin zcin self-assigned this Aug 8, 2024
@zcin zcin linked a pull request Aug 13, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant