Field for run/task error messages (beyond stderr) #195

cjllanwarne · 2022-10-18T16:40:10Z

Looking at the WES spec I cannot find a way to return an error message explaining why a run or task failed.

There is a field for stderr but that has different semantics than a task error or run error, so doesn't quite fit.

As an example, imagine a task looks good but emits some logging information to its stderr. It then fails because a file cannot be delocalized successfully.

The natural semantics of the stderr field is that it should come directly from the task, but that leaves no place to put the overall "reason the task is a failure", which is an engine level delocalization message.

Another motivating use case at the run level: say a workflow script is badly formed. We want to be able to mark that run as failed, and give it an error message, but that error has nothing to do with stderr and thus that field is inappropriate.

Recommendation:

Add to both RunLog and TaskLog an additional, and optional, errors field, containing a list of strings describing why the engine decided to mark that run or task as failed.

The text was updated successfully, but these errors were encountered:

uniqueg · 2022-10-18T20:03:24Z

Thanks @cjllanwarne, an interesting suggestion.

However, on the task level, I'm not quite sure I understand why you think that stderr doesn't fit. In my understanding, if a task fails, it is up to the individual tool to let the user know what went wrong, typically via the STDERR stream. A workflow engine would have no (semantic) understanding of what exactly went wrong (and neither has the platform executing the task/container), so all they do is log that there has been an error, together with the tool's STDERR stream and exit status. Or am I missing something?

On the WES side, in such a scenario, I believe RunLog.state should be set to EXECUTION_ERROR, the tool's STDERR stream should be made available in that tool's RunLog.task_logs.$.stderr property, and the workflow's STDERR stream should be made available in the run's RunLog.run_log.stderr property (and the exit_codes set for both task and workflow).

Things may be different for workflow engine errors, as the engine should know, e.g., about a badly formed workflow or issues with the task execution engines, and could therefore report errors more directly/explicitly along the lines you mention. WES wrappers could then parse the engine logs to populate such a field (or perhaps if an engine is aware that it is behind a WES, it could inform the WES more directly). And I could totally see that being useful (certainly more user friendly than showing the last n lines of the engine's STDERR stream).

cjllanwarne · 2022-10-21T19:22:36Z

So one example might be that a cloud executor kills a task because it ran out of memory - but the task itself did not emit any stdout, it just got killed.

In that case, the task error message could be something like "task killed by supervisor", and stderr should be the empty string.

wleepang · 2022-10-22T16:45:01Z

Rather than an errors list of strings, how about a statusMessage optional string property? This broadens the scope of the message to handle statuses other than EXECUTOR_ERROR at the run level or FAILED at the task level.

cjllanwarne · 2022-10-24T14:20:31Z

A statusMessage could work too, as long as we could reliably choose to forward it to users as an error if the workflow or task failed.

To be honest, I find APIs where fields have a single, clear purpose to be much easier to use than APIs with "cover all bases" fields, because "cover all bases" fields tend to need human understanding to use, whereas simple clear fields are easy to build automations around.

uniqueg · 2022-10-31T23:46:04Z

I agree that clear semantics are important, but I think if cross-referencing another field resolves any ambiguity, then I think it's fine to overload a field. And checking the workflow's or task's state would provide the necessary information to put statusMessages in context.

Still, I'm wondering: Are we only talking about SYSTEM_ERRORs here? I still don't quite see how, in the absence of standardized error reporting, we could have WES and TES servers understand workflow engine or tool errors semantically. And so I don't see how they could provide errors or statusMessages field for these errors. The examples @cjllanwarne gave all sound like SYSTEM_ERRORS to me. Conversely, stderr should be the right place to find out more details about EXECUTION_ERRORs, so I suppose that the suggested change would complement this very well, and this distinction could potentially also help address #179.

cjllanwarne · 2022-11-15T18:46:31Z

Are we only talking about SYSTEM_ERRORs here?

Yes. But just because an error is a system error doesn't mean we don't want error message information for it. And having an error message from the system and a stderr print out from the tool itself are separately useful things to have.

uniqueg · 2022-11-28T09:12:39Z

Absolutely agree with you. Just wanted to clarify.

Anyway, I certainly would like to see a PR for this, in case you are interested to raise one @cjllanwarne :) Guess that would make any remaining discussions more focused (if any)

patmagee · 2023-04-06T16:37:52Z

@cjllanwarne would #205 address some of your concerns at least at the task level?

cjllanwarne changed the title ~~Field for run/task error messages beyond stderr~~ Field for run/task error messages (beyond stderr) Oct 18, 2022

patmagee mentioned this issue Mar 31, 2023

Add unique identifier and additional information to tasks #204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field for run/task error messages (beyond stderr) #195

Field for run/task error messages (beyond stderr) #195

cjllanwarne commented Oct 18, 2022 •

edited

Loading

uniqueg commented Oct 18, 2022 •

edited

Loading

cjllanwarne commented Oct 21, 2022

wleepang commented Oct 22, 2022

cjllanwarne commented Oct 24, 2022 •

edited

Loading

uniqueg commented Oct 31, 2022

cjllanwarne commented Nov 15, 2022 •

edited

Loading

uniqueg commented Nov 28, 2022

patmagee commented Apr 6, 2023

Field for run/task error messages (beyond stderr) #195

Field for run/task error messages (beyond stderr) #195

Comments

cjllanwarne commented Oct 18, 2022 • edited Loading

uniqueg commented Oct 18, 2022 • edited Loading

cjllanwarne commented Oct 21, 2022

wleepang commented Oct 22, 2022

cjllanwarne commented Oct 24, 2022 • edited Loading

uniqueg commented Oct 31, 2022

cjllanwarne commented Nov 15, 2022 • edited Loading

uniqueg commented Nov 28, 2022

patmagee commented Apr 6, 2023

cjllanwarne commented Oct 18, 2022 •

edited

Loading

uniqueg commented Oct 18, 2022 •

edited

Loading

cjllanwarne commented Oct 24, 2022 •

edited

Loading

cjllanwarne commented Nov 15, 2022 •

edited

Loading