Developing bento_wes
requires Python 3.10+ and Poetry >=1.5.1
.
Workflow execution service for the Bento platform. This service implements the GA4GH WES API schema with additional Bento-specific features.
A workflow is based on a .wdl
file which defines the different tasks with
their related I/O dependencies (i.e. which variables or files are required as
input, and what is the output of the workflow). See the
Workflow Definition Language Specs
for more information. A mandatory JSON file containing the
required metadata (variables values, file names, etc... to be used by the workflow) is also provided.
In Bento, each data related service (e.g. Katsu, Gohan) stores its own workflows
in a /**/workflows/
directory. The workflows can be requested from the
workflows
API endpoints exposed by these microservices (e.g. list all workflows,
show details or download .wdl
file for a specific workflow).
Note that 2 different files are exported: the regular.wdl
file (consumed
by bento-wes
) and a JSON file which lists the inputs and the outputs of the
workflow. This JSON file is consumed by bento-web
to build the corresponding
form for the user and propagate some settings to the workflow metadata.
The files generated by the workflow are retrieved (e.g. storage in DRS) using the
output variables defined in this secondary JSON document (not part of the WDL specs).
Finally, the metadata is generated by the bento-web
(!not bento-wes
!)
service when the execution is triggered, including the reference to the
workflow.
The WES container may receive a /runs
POST request to execute a given workflow
with specified metadata. The WES service then queries the worflow provider
to get the relevant .wdl
file which is copied over in a temporary execution folder,
along with the metadata as JSON.
The Cromwell workflow management system is used to execute the WDL files. In a first step, the dependencies such as input files are copied over locally. Note that in development mode, the temporary files are not cleaned up after completion.
Each run is monitored and its state is stored in a local database.
Note that some metadata may contain callback urls which are called once the workflow described in the wdl file has been executed. This is the case for Katsu ingestions workflows.
The WES needs to access the files used as input. It may also pass references to files to other services as part of the workflow. For example during an ingestion workflow, a file must be passed to the relevant data service for ingestion in its internal database. This file transfer is based on mounted volumes shared between the containers.
Of note, the wes/tmp
directory is mounted in some data service container (with the exception of Gohan which mounts the dropbox data directory instead).
When a workflow is executed, this is where the necessary input files are stagged.
This side effect is used to pass files for ingestion to the relevant containers.
Some workflows (ingestions workflows in Katsu) contain an "identity" task which
only takes a path to a dropbox file as input and returns a local path to a temp file. Note that the /wes/tmp
volume must be mounted to the same
path in every container for this to work seamlessly.
ca.c3g.bento:wes:VERSION
Parameter:
{
"workflow_params": {/* ... */}, // unused?
"workflow_type": "WDL",
"worflow_type_version": "1.0",
"workflow_engine_parameters": {}, // unused
"workflow_url": "...", // where the WES can fetch the wdl file,
"tags": {
"workflow_id": "...", // must correspond to the worflow_params namespace
"workflow_metadata": {
"inputs": [{}] // Defines setup for injecting values into the .wdl input section. IDs must align.
}
}
}
Note: this diverges from GA4GH recommendations: tags.workflow_metadata
should
be in workflow_params
. The usage of the tags
property is Bento specific
and the callback mechanism should probably be part of the tasks definition.
Parameter Optional with_details
(BOOL)
Lists all runs.
Details of the run corresponding to the uuid
Stream of run's stdout or sterr respectively
Cancel run
Get run state
# Bento instance or service base URL, used for generating absolute URLs within
# the service, for making requests, and for re-writing internal URLS in the case
# of Singularity-based Bento instances
BENTO_URL=http://127.0.0.1:5000/
# Debug mode for the service - falls back to FLASK_ENV (development = true,
# any other value = false) if not set
# SECURITY NOTE: This SHOULD NOT EVER be enabled in production, as it removes
# checks for TLS certificate validity!
BENTO_DEBUG=False
# SSL Configuration - whether to validate certificates
BENTO_VALIDATE_SSL=True
# Celery configuration
CELERY_RESULT_BACKEND=redis://
CELERY_BROKER_URL=redis://
# Event Redis connection
BENTO_EVENT_REDIS_URL=redis://localhost:6379
# Run/task database location
DATABASE=data/bento_wes.db
# Service configuration
# - unique ID service within for Bento instance
SERVICE_ID=
# - persistent data directory - this is used for file output artifacts from
# workflows, which is especially useful for analysis/export workflows.
SERVICE_DATA=data
# - temporary data directory - the service currently does not make this by
# itself, so this must be created prior to startup
SERVICE_TEMP=tmp
# - base url for service endpoints
SERVICE_BASE_URL=http://localhost:5000/
# Location of WOMtool, used to validate WDL files
# - If not set, no WDL validation will be done
# - SECURITY: If not set, WDL_HOST_ALLOW_LIST must contain a comma-separated
# list of hosts workflow files can be downloaded from
WOM_TOOL_LOCATION=/path/to/womtool.jar
# Allow-list (comma-separated) for hosts that workflow files can be downloaded
# from - prevents possibly insecure WDLs from being ran
WORKFLOW_HOST_ALLOW_LIST=
# Service URL configuration:
BENTO_AUTHZ_SERVICE_URL=
DRS_URL=https://portal.bentov2.local/api/drs
SERVICE_REGISTRY_URL=
# CORS
CORS_ORIGINS='*'
wes_run_updated
: TODO
wes_run_completed
: TODO
After cloning the repository, let Poetry manage the virtual environment and install the development dependencies for you:
pip install poetry # if not done so already
poetry install # will automatically create a virtual environment
To run all tests and linting, use the following command:
poetry run tox
-
All tests pass
-
Package version has been updated (following semver) in
bento_lib/package.cfg
-
A release can then be created, tagged in the format of
v#.#.#
and named in the format ofVersion #.#.#
, listing any changes made, in the GitHub releases page tagged from the master branch!
The bento_wes
project uses semantic versioning for
releasing. If the API is broken in any way, including minor differences in the
way a function behaves given an identical set of parameters (excluding bugfixes
for unintentional behaviour), the MAJOR version must be incremented. In this
way, we guarantee that projects relying on this API do not accidentally break
upon upgrading.
The bento_wes
service can be deployed with a WSGI server like Gunicorn or
UWSGI, specifying bento_wes.app:application
as the WSGI application.
It is best to then put an HTTP server software such as NGINX in front of Gunicorn.
Flask applications should NEVER be deployed in production via the Flask
development server, i.e. flask run
!
To run the Celery worker (required to actually run jobs), the following command (or similar) can be used:
nohup poetry run celery --loglevel=INFO --app bento_wes.app worker &> celery.log &
This service is built around a Flask application. It uses Celery to monitor and run workflows executed by Cromwell.
The workflows are downloaded from local services.
There are no checks on the workflows validity in that case
(assumption that workflows coming from configured hosts are correct,
see above WORKFLOW_HOST_ALLOW_LIST
env variable).
For now the WOMtool utility used for checking .wdl
files
validity is disabled in Bento (see the corresponding Dockerfile).
This script contains the routes definitions as Flask's Blueprints
This script contains the implementation of workflow execution.
The expected inputs come from the workflow metadata (Bento-specific), which
also define how bento_web
will render the workflow set-up UI.
Another extension to the workflow metadata inputs is used to get values from the WES
configuration variables. The special value FROM_CONFIG
causes the interpolation
to the Flask app.config property matching the id
in uppercase.
In the following example, the value for this variable will come from the config
property KATSU_URL
.
{
// ...,
"inputs": [
{
"id": "katsu_url",
"type": "string",
"required": true,
"value": "FROM_CONFIG",
"hidden": true,
}, // ...
],
// ...
}