-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Add health checking for http proxy actors #34944
Conversation
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! leave some nits and please make sure unit tests cover all the statues :)
Should only be called after confirming the object ref is ready. | ||
Resets _health_check_obj_ref to None at the end. | ||
""" | ||
assert len(ray.wait([self._health_check_obj_ref], timeout=0)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to have this assert?
Signed-off-by: Cindy Zhang <[email protected]>
|
Oops yes, updated |
python/ray/serve/schema.py
Outdated
class HTTPProxyDetails(BaseModel): | ||
status: HTTPProxyStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add node_id, ip_address here?
@alanwguo anything else that there would be useful to show in the UI?
oh... maybe logs path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any metrics related to HTTP Proxy today? I don't think so.
If we did, it would be useful to have some sort of identifier for the HTTP Proxy so i could link or filter the metrics down to the metrics related to a particular HTTP proxy.
For HTTPProxyStatus, do we have any sort of string that can provide more details about errors or anything?
Creation date could be useful I guess also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we do have some metrics. The unique identifier would be the node_id
, though that doesn't handle restarts if they were to happen. So maybe we should add actor_id
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added metadata for http proxy here: https://github.com/ray-project/ray/pull/34944/files#diff-e454e14e8c39af64fcec6997326bd32012fc91afe5d09d6ac1486aed6199afd1R764-R776
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: Cindy Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thanks for picking it up!
Add health checking to HTTP Proxy Add no-op check_health() method to proxy actor - In controller, periodically call check_health() method - Before the first health check returns, status is STARTING - If an error occurs, set UNHEALTHY - If request times out, set UNHEALTHY
Why are these changes needed?
Add health checking to HTTP Proxy
check_health()
method to proxy actorcheck_health()
methodRelated issue number
Closes #19151
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.