Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a required input and output format for workflow engines. #357

Merged
merged 6 commits into from
Mar 30, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Keep the changelog pleasant to read in the text editor:

version 2.0.0
---------------------------
+ Added a required input and output format for workflow engines.
[PR 357](https://github.com/openwdl/wdl/pull/357)

+ The input specification has been clarified.
[PR 314](https://github.com/openwdl/wdl/pull/314) by @geoffjentry.
Expand Down
94 changes: 91 additions & 3 deletions versions/development/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -2896,9 +2896,15 @@ Note that because some call inputs are left unsatisfied, this workflow could not

Once workflow inputs are computed (see previous section), the value for each of the fully-qualified names needs to be specified per invocation of the workflow. The format of workflow inputs is implementation specific.

### Cromwell-style Inputs
### Common WDL JSON Input Format

The "Cromwell-style" input format is widely supported by WDL implementations and recommended for portability purposes. In the Cromwell-style format, workflow inputs are specified as key/value pairs in JSON or YAML. The mapping to WDL values is codified in the [serialization of task inputs](#serialization-of-task-inputs) section.
The Common WDL JSON input format is an input format that is required to be
accepted by any workflow engine. It is the input format that is used for
performing language compliance tests on the engine.

In this common format, workflow inputs are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be usable for both workflow and task inputs/outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the workflow inputs chapter. Can you clarify what you mean by task inputs in this context?

Copy link
Collaborator

@jdidion jdidion Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some workflow engines (miniwdl, dxWDL) support running individual tasks in addition to workflows. This input specification should be the same for both workflows and inputs, and it should be required for any workflow engine that allows running tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. My only comment is to consider shortening the concept name from Common WDL JSON Input Format to JSON Input Format. It makes it more readable, and reduces potential confusion with the "Common Workflow Language". After all, this document is the authoritative document on the WDL language, so the words common and WDL should be superfluous. Any changes an engine makes are not standard/common by definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orodeh Very good suggestion. It was also a bit tedious to type.

@jdidion, running individual tasks is a feature that is not listed in the spec. So workflow engines can do as they please with regards to running tasks. We can of course make running tasks part of the spec, but that will be another PR entirely.

specified as key/value pairs in JSON. The mapping to WDL values is codified in
the [serialization of task inputs](#serialization-of-task-inputs) section.

In JSON, the inputs to the workflow in the previous section might be:

Expand All @@ -2917,7 +2923,89 @@ In JSON, the inputs to the workflow in the previous section might be:
}
```

It's important to note that the type in JSON must be coercible to the WDL type. For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "three"`, this coercion from string to integer is not valid and would result in a coercion error. See the section on [Type Coercion](#type-coercion) for more details.
It's important to note that the type in JSON must be coercible to the WDL type.
For example `wf.int_val` expects an integer, but if we specified it in JSON as
`"wf.int_val": "three"`, this coercion from string to integer is not valid and
would result in a coercion error. See the section on
[Type Coercion](#type-coercion) for more details.

### Other input formats

Next to the Common WDL JSON input format, engines can support any other format.
The only requirement is that they provide a tool or documentation to convert
these custom formats into Common WDL JSON input format, so users can share their
workflow inputs across the WDL ecosystem.

# Computing outputs

## Computing workflow outputs

Every engine should have an option to provide the outputs. Either as a file,
an API, or as stdout. The output format should be composed according to the
following rules:

* Workflow outputs are defined in the output section of the workflow.
* A workflow without an output section has no outputs.
* An engine may optionally provide outputs not listed in the output section
(i.e. task and subworkflow outputs) for debugging purposes, if explicitly
requested by the user.

The above rules only apply to output formats delivered by the engine. Listing
subworkflow and task outputs in logs is not covered by these rules.

## Output formats

### Common WDL JSON Output Format

Every workflow engine should provide an option to provide the Common WDL JSON
output format. This output will be used for performing language compliance tests
on the engine.

In Common WDL JSON output format the outputs are specified as key/value pairs in
JSON. Only outputs from the workflow are provided. Sub-workflow and task outputs
are not provided.

The output value will be fully qualified. Check the [chapter on fully qualified
names](#fully-qualified-names--namespaced-identifiers) for more details.

WDL values will be coerced to their corresponding JSON types as described in
the chapter on [type coercion](#type-coercion).

Example for the following workflow:


```WDL
workflow example {
...

output {
String foo = cafetaria.inn
File analysis_results = analysis.results
Int read_count = readcounter.result
Float kessel_run_parsecs = trip_to_space.distance
Boolean sample_swap_detected = array_concordance.concordant
Array[File] sample_variants = variant_calling.vcfs
Map[String, Int] droids = escape_pod.cargo
}
}
```

The Common WDL output JSON could look like this:
```JSON
{
"example.foo": "bar",
"example.analysis_results": "/path/to/my/analysis/results.txt",
"example.read_count": 50157187,
"example.kessel_run_parsecs": 11.98,
"example.sample_swap_detected": false,
"example.sample_variants": ["/data/patient1.vcf", "/data/patient2.vcf"],
"example.droids": {"C": 3, "D": 2, "P": 0, "R": 2}
}
```

### Other output formats

Any other output formats may be supported by engines.

# Type Coercion

Expand Down