Skip to content

Commit

Permalink
Update wdl_runner to support pipelines v2 (#261)
Browse files Browse the repository at this point in the history
* Pipelines v2 wdl_runner integration
  • Loading branch information
kdbinder authored and geoffjentry committed Nov 29, 2018
1 parent 90881fc commit df117eb
Show file tree
Hide file tree
Showing 7 changed files with 33 additions and 45 deletions.
4 changes: 2 additions & 2 deletions runners/cromwell_on_google/wdl_runner/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ RUN chmod u+x /wdl_runner/wdl_runner.sh
# Copy Cromwell and the Cromwell conf template
RUN mkdir /cromwell
RUN cd /cromwell && \
curl -L -O https://github.com/broadinstitute/cromwell/releases/download/29/cromwell-29.jar
RUN ln /cromwell/cromwell-29.jar /cromwell/cromwell.jar
curl -L -O https://github.com/broadinstitute/cromwell/releases/download/36/cromwell-36.jar
RUN ln /cromwell/cromwell-36.jar /cromwell/cromwell.jar
COPY jes_template.conf /cromwell/

# Set up the runtime environment
Expand Down
20 changes: 10 additions & 10 deletions runners/cromwell_on_google/wdl_runner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
This example demonstrates running a multi-stage workflow on
Google Cloud Platform.

* The workflow is launched with the Google Genomics [Pipelines API](https://cloud.google.com/genomics/v1alpha2/pipelines).
* The workflow is launched with the Google Genomics [Pipelines API](https://cloud.google.com/genomics/docs/quickstart).
* The workflow is defined using the Broad Institute's
[Workflow Definition Language](https://software.broadinstitute.org/wdl/) (WDL).
* The workflow stages are orchestrated by the Broad Institute's
Expand Down Expand Up @@ -55,7 +55,7 @@ The code in the wdl_runner Docker image includes:

* [OpenJDK 8](http://openjdk.java.net/projects/jdk8/) runtime engine (JRE)
* [Python 2.7](https://www.python.org/download/releases/2.7/) interpreter
* [Cromwell release 29](https://github.com/broadinstitute/cromwell/releases/tag/29)
* [Cromwell release 36](https://github.com/broadinstitute/cromwell/releases/tag/36)
* [Python and shell scripts from this repository](.)

Take a look at the [Dockerfile](./Dockerfile) for full details.
Expand All @@ -67,7 +67,7 @@ Take a look at the [Dockerfile](./Dockerfile) for full details.
2. Enable the Genomics, Cloud Storage, and Compute Engine APIs on a new
or existing Google Cloud Project using the [Cloud Console](https://console.cloud.google.com/flows/enableapi?apiid=genomics,storage_component,compute_component&redirect=https://console.cloud.google.com)

3. Follow the Google Genomics [getting started instructions](https://cloud.google.com/genomics/install-genomics-tools#install-genomics-tools) to install and authorize the Google Cloud SDK.
3. Follow the Google Genomics [getting started instructions](https://cloud.google.com/genomics/docs/quickstart) to install and authorize the Google Cloud SDK.

4. Follow the Cloud Storage instructions for [Creating Storage Buckets](https://cloud.google.com/storage/docs/creating-buckets) to create a bucket for workflow output and logging

Expand Down Expand Up @@ -134,12 +134,12 @@ docker:
gcloud \
alpha genomics pipelines run \
--pipeline-file wdl_pipeline.yaml \
--zones us-central1-f \
--inputs-from-file WDL=test-wdl/ga4ghMd5.wdl \
--inputs-from-file WORKFLOW_INPUTS=test-wdl/ga4ghMd5.inputs.json \
--inputs-from-file WORKFLOW_OPTIONS=test-wdl/basic.papi.us.options.json \
--inputs WORKSPACE=gs://YOUR-BUCKET/wdl_runner/work \
--inputs OUTPUTS=gs://YOUR-BUCKET/wdl_runner/output \
--regions us-central1 \
--inputs-from-file WDL=test-wdl/ga4ghMd5.wdl,\
WORKFLOW_INPUTS=test-wdl/ga4ghMd5.inputs.json,\
WORKFLOW_OPTIONS=test-wdl/basic.papi.us.options.json \
--env-vars WORKSPACE=gs://YOUR-BUCKET/wdl_runner/work,\
OUTPUTS=gs://YOUR-BUCKET/wdl_runner/output \
--logging gs://YOUR-BUCKET/wdl_runner/logging
```

Expand Down Expand Up @@ -222,7 +222,7 @@ TOTAL: 2 objects, 5297 bytes (5.17 KiB)
## (6) Check the output

```
$ gsutil cat gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/md5sum.txt
$ gsutil cat gs://YOUR-BUCKET/wdl_runner/output/md5sum.txt
00579a00e3e7fa0674428ac7049423e2
```

Expand Down
5 changes: 1 addition & 4 deletions runners/cromwell_on_google/wdl_runner/file_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

from googleapiclient import discovery
from googleapiclient.errors import HttpError
from oauth2client.client import GoogleCredentials

import sys_util

Expand Down Expand Up @@ -67,9 +66,7 @@ def verify_gcs_dir_empty_or_missing(path):
prefix = parts[1] if len(parts) > 1 else None

# Get the storage endpoint
credentials = GoogleCredentials.get_application_default()
service = discovery.build('storage', 'v1', credentials=credentials,
cache_discovery=False)
service = discovery.build('storage', 'v1', cache_discovery=False)

# Build the request - only need the name
fields = 'nextPageToken,items(name)'
Expand Down
2 changes: 1 addition & 1 deletion runners/cromwell_on_google/wdl_runner/jes_template.conf
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ backend {
default = "JES"
providers {
JES {
actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
config {
project = "${project_id}"
root = "${working_dir}"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"ga4ghMd5.inputFile": "gs://ga4gh-tool-execution-challenge/phase1/md5sum.input"
}
"ga4ghMd5.inputFile": "gs://ga4gh-tool-execution-challenge/phase1/md5sum.input"
}
35 changes: 13 additions & 22 deletions runners/cromwell_on_google/wdl_runner/wdl_pipeline.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,15 @@
name: WDL Runner
description: Run a workflow defined by a WDL file

inputParameters:
- name: WDL
description: Workflow definition
- name: WORKFLOW_INPUTS
description: Workflow inputs
- name: WORKFLOW_OPTIONS
description: Workflow options

- name: WORKSPACE
description: Cloud Storage path for intermediate files
- name: OUTPUTS
description: Cloud Storage path for output files

docker:
imageName: gcr.io/broad-dsde-outreach/wdl_runner:2017_10_02

cmd: >
/wdl_runner/wdl_runner.sh
actions:
- name: SSH
imageUri: gcr.io/cloud-genomics-pipelines/tools
entrypoint: ssh-server
flags: [ 'RUN_IN_BACKGROUND' ]
portMappings:
22: 22
- name: WDL_Runner
commands: [ '/wdl_runner/wdl_runner.sh' ]
imageUri: gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28

resources:
minimumRamGb: 3.75
virtualMachine:
machineType: n1-standard-1

8 changes: 4 additions & 4 deletions runners/cromwell_on_google/wdl_runner/wdl_runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ set -o nounset

readonly INPUT_PATH=/pipeline/input

# WDL, INPUTS, and OPTIONS file contents are all passed into
# WDL, INPUTS, and OPTIONS filenames are all passed into
# the pipeline as environment variables - write them out as
# files.
mkdir -p "${INPUT_PATH}"
echo "${WDL}" > "${INPUT_PATH}/wf.wdl"
echo "${WORKFLOW_INPUTS}" > "${INPUT_PATH}/wf.inputs.json"
echo "${WORKFLOW_OPTIONS}" > "${INPUT_PATH}/wf.options.json"
cp "${WDL}" "${INPUT_PATH}/wf.wdl"
cp "${WORKFLOW_INPUTS}" "${INPUT_PATH}/wf.inputs.json"
cp "${WORKFLOW_OPTIONS}" "${INPUT_PATH}/wf.options.json"

# Set the working directory to the location of the scripts
readonly SCRIPT_DIR=$(dirname $0)
Expand Down

0 comments on commit df117eb

Please sign in to comment.