Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field for run/task error messages (beyond stderr) #195

Open
cjllanwarne opened this issue Oct 18, 2022 · 8 comments
Open

Field for run/task error messages (beyond stderr) #195

cjllanwarne opened this issue Oct 18, 2022 · 8 comments

Comments

@cjllanwarne
Copy link

cjllanwarne commented Oct 18, 2022

Looking at the WES spec I cannot find a way to return an error message explaining why a run or task failed.

There is a field for stderr but that has different semantics than a task error or run error, so doesn't quite fit.

As an example, imagine a task looks good but emits some logging information to its stderr. It then fails because a file cannot be delocalized successfully.

The natural semantics of the stderr field is that it should come directly from the task, but that leaves no place to put the overall "reason the task is a failure", which is an engine level delocalization message.

Another motivating use case at the run level: say a workflow script is badly formed. We want to be able to mark that run as failed, and give it an error message, but that error has nothing to do with stderr and thus that field is inappropriate.

Recommendation:

  • Add to both RunLog and TaskLog an additional, and optional, errors field, containing a list of strings describing why the engine decided to mark that run or task as failed.
@cjllanwarne cjllanwarne changed the title Field for run/task error messages beyond stderr Field for run/task error messages (beyond stderr) Oct 18, 2022
@uniqueg
Copy link
Contributor

uniqueg commented Oct 18, 2022

Thanks @cjllanwarne, an interesting suggestion.

However, on the task level, I'm not quite sure I understand why you think that stderr doesn't fit. In my understanding, if a task fails, it is up to the individual tool to let the user know what went wrong, typically via the STDERR stream. A workflow engine would have no (semantic) understanding of what exactly went wrong (and neither has the platform executing the task/container), so all they do is log that there has been an error, together with the tool's STDERR stream and exit status. Or am I missing something?

On the WES side, in such a scenario, I believe RunLog.state should be set to EXECUTION_ERROR, the tool's STDERR stream should be made available in that tool's RunLog.task_logs.$.stderr property, and the workflow's STDERR stream should be made available in the run's RunLog.run_log.stderr property (and the exit_codes set for both task and workflow).

Things may be different for workflow engine errors, as the engine should know, e.g., about a badly formed workflow or issues with the task execution engines, and could therefore report errors more directly/explicitly along the lines you mention. WES wrappers could then parse the engine logs to populate such a field (or perhaps if an engine is aware that it is behind a WES, it could inform the WES more directly). And I could totally see that being useful (certainly more user friendly than showing the last n lines of the engine's STDERR stream).

@cjllanwarne
Copy link
Author

So one example might be that a cloud executor kills a task because it ran out of memory - but the task itself did not emit any stdout, it just got killed.

In that case, the task error message could be something like "task killed by supervisor", and stderr should be the empty string.

@wleepang
Copy link
Contributor

Rather than an errors list of strings, how about a statusMessage optional string property? This broadens the scope of the message to handle statuses other than EXECUTOR_ERROR at the run level or FAILED at the task level.

@cjllanwarne
Copy link
Author

cjllanwarne commented Oct 24, 2022

A statusMessage could work too, as long as we could reliably choose to forward it to users as an error if the workflow or task failed.

To be honest, I find APIs where fields have a single, clear purpose to be much easier to use than APIs with "cover all bases" fields, because "cover all bases" fields tend to need human understanding to use, whereas simple clear fields are easy to build automations around.

@uniqueg
Copy link
Contributor

uniqueg commented Oct 31, 2022

I agree that clear semantics are important, but I think if cross-referencing another field resolves any ambiguity, then I think it's fine to overload a field. And checking the workflow's or task's state would provide the necessary information to put statusMessages in context.

Still, I'm wondering: Are we only talking about SYSTEM_ERRORs here? I still don't quite see how, in the absence of standardized error reporting, we could have WES and TES servers understand workflow engine or tool errors semantically. And so I don't see how they could provide errors or statusMessages field for these errors. The examples @cjllanwarne gave all sound like SYSTEM_ERRORS to me. Conversely, stderr should be the right place to find out more details about EXECUTION_ERRORs, so I suppose that the suggested change would complement this very well, and this distinction could potentially also help address #179.

@cjllanwarne
Copy link
Author

cjllanwarne commented Nov 15, 2022

Are we only talking about SYSTEM_ERRORs here?

Yes. But just because an error is a system error doesn't mean we don't want error message information for it. And having an error message from the system and a stderr print out from the tool itself are separately useful things to have.

@uniqueg
Copy link
Contributor

uniqueg commented Nov 28, 2022

Absolutely agree with you. Just wanted to clarify.

Anyway, I certainly would like to see a PR for this, in case you are interested to raise one @cjllanwarne :) Guess that would make any remaining discussions more focused (if any)

@patmagee
Copy link
Contributor

patmagee commented Apr 6, 2023

@cjllanwarne would #205 address some of your concerns at least at the task level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants