Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add healthcheck support for JetStream #90

Merged
merged 4 commits into from
May 29, 2024

Conversation

vivianrwu
Copy link
Contributor

@vivianrwu vivianrwu commented May 24, 2024

This PR adds the healthcheck support endpoint for JetStream.
This covers checking the model server is alive based on the self.live field here - https://github.com/google/JetStream/blob/e19a7906d8cdf1cae658a4c7c4f6f516aade49f9/jetstream/core/orchestrator.py#L377

The healthcheck endpoint can be invoked via the following:

healthcheck_request = jetstream_pb2.HealthCheckRequest()
healthcheck_response = stub.HealthCheck(healthcheck_request)
healthcheck_response = await healthcheck_response

healthcheck_response: {
  is_live: bool
}

Test coverage can be found under test_server.py once the server is started.
This functionality will be useful for model server liveness and readiness checks.

@liurupeng
Copy link

liurupeng commented May 28, 2024

@vivianrwu could you add a description about why we are adding this PR? thanks! (as well as the test steps

Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! After fixing the CI checks, we are good to go!

"LLM orchestrator is being used in offline test mode, and will not"
" respond to gRPC queries - only direct function calls."
)
is_live = self._driver.live
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share where driver set live status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing it!

@JoeZijunZhou JoeZijunZhou merged commit 0c56aac into AI-Hypercomputer:main May 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants