-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix:Improve log warnings in REST API /health endpoint #5381
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated against CUDA driver version 470.XX that is has the known pynvml issue described here.
Can confirm that the error is not logged 🚀 Thank you for taking care of it.
@ArzelaAscoIi would you please test this version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected 👍 Thank you!
@anakin87 the existing unit test related to /health checkpoint |
@vblagoje please go ahead and merge! |
* Improve warning in REST APIs get_health_status method * Convert log message * A better solution and documentation * Add another nested try/except block * Simplify
What?
In this PR, we make an improvement to the log warning in the REST API /health endpoint. Specifically, we change the warning message from "No NVIDIA GPU found." to "Couldn't collect GPU stats: ".
Why?
The current warning message is misleading and may cause confusion among users, potentially leading to unnecessary new issues being created. By giving a more descriptive and accurate warning, we aim to provide a better user experience and minimize confusion.
How can it be used?
This change is internal and does not affect the way users interact with the REST API /health endpoint. The improvement will only affect the content of the log messages generated when a request is made to this endpoint.
How did you test it?
The testing for this change is still ongoing. Manual testing is currently being conducted to ensure the updated log warning is accurate and provides the desired clarity.
Notes for the reviewer
Do not integrate yet. This PR is issued to ease communication between the stakeholders involved.