Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved framework messaging for POD failures #666

Open
aradhakrishnanGFDL opened this issue Aug 16, 2024 · 2 comments
Open

Improved framework messaging for POD failures #666

aradhakrishnanGFDL opened this issue Aug 16, 2024 · 2 comments
Labels
suggestion ideas to improve or extend code

Comments

@aradhakrishnanGFDL
Copy link
Collaborator

What problem will this feature solve?
Improved messaging so new users can help with triaging.
Describe the solution you'd like
A suggestion for the framework messaging through the ff POD exercise is to explicitly print helpful messages at the bottom of the errors, for users to refer to the log files with the path -- and hint for possible issues within the POD or data preprocessing depending on the case. Some version of this or a pointer to the sphinx docs that has these elaborated will be helpful..something to consider.

@aradhakrishnanGFDL aradhakrishnanGFDL added the suggestion ideas to improve or extend code label Aug 16, 2024
@wrongkindofdoctor
Copy link
Collaborator

wrongkindofdoctor commented Aug 16, 2024

@aradhakrishnanGFDL It is probably beneficial to advise POD developers to use detailed print statements, as the framework cannot capture logging info directly from the subprocessruntimemanager without more sophisticated logging features that may a large ask for POD developers to implement; the framework can only tell if the POD subprocess fails or succeeds, and subprocessruntimemanager messaging reflects this. Any POD logging occurs via a logger attached to the pod object, and is limited to whatever the POD prints to the terminal and the final POD stack trace.

@aradhakrishnanGFDL
Copy link
Collaborator Author

Yes, adding things to POD dev best practices would be nice. The POD in question did have a print statement. My proposal is not to give a detailed error message, but to provide users with some pointers after the following deactivation, assuming MDTF takes control after the subprocess call (e.g crawls through the output). Just a simple "Please refer to this documentation on how to look for the logfiles to troubleshoot the issue". Does that make sense?

ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_IRF.png'.
ERROR: Deactivated <#1BUm:forcing_feedback> due to MDTFFileNotFoundError("[Errno 2] No such file or directory: 'Missing 11 files.'").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion ideas to improve or extend code
Projects
None yet
Development

No branches or pull requests

2 participants