Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WX-1179 GCP Batch Docs Update #7196

Merged
merged 6 commits into from
Aug 11, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 11 additions & 57 deletions docs/backends/GCPBatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
[//]:
Google Cloud Batch is a fully managed service that lets you schedule, queue, and execute batch processing workloads on Google Cloud resources. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale.

This section offers detailed configuration instructions for using Cromwell with the Batch API in all supported
This section offers detailed configuration instructions for using Cromwell with the Google Cloud Batch in all supported
authentication modes. Before reading further in this section please see the
[Getting started on Google Batch API](../tutorials/Batch101) for instructions common to all authentication modes
[Getting started on Google Cloud Batch](../tutorials/Batch101) for instructions common to all authentication modes
and detailed instructions for the application default authentication scheme in particular.
The instructions below assume you have created a Google Cloud Storage bucket and a Google project enabled for the appropriate APIs.

Expand Down Expand Up @@ -90,7 +90,6 @@ While technically not part of Service Account authentication mode, one can also

A [JSON key file for the service account](../wf_options/Google.md) must be passed in via the `user_service_account_json` field in the [Workflow Options](../wf_options/Google.md) when submitting the job. Omitting this field will cause the workflow to fail. The JSON should be passed as a string and will need to have no newlines and all instances of `"` and `\n` escaped.

[//]: # (TODO: is jes_gcs_root the correct workflow option?)
In the likely event that this service account does not have access to Cromwell's default google project the `google_project` workflow option must be set. In the similarly likely case that this service account can not access Cromwell's default google bucket, the `jes_gcs_root` workflow option should be set appropriately.

For information on the interaction of `user_service_account_json` with private Docker images please see the `Docker` section below.
Expand All @@ -113,13 +112,11 @@ task mytask {
}
```

In order for a private image to be used the appropriate Docker configuration must be provided. If the Docker images being used
In order for a private image to be used, Docker Hub credentials must be provided. If the Docker images being used
are public there is no need to add this configuration.

For Batch

[//]: # (TODO: Is this the correct way to configure Docker for batch?)
[//]: # (5-4-23: Leave alone for now)
```
backend {
default = GCPBATCH
Expand All @@ -129,8 +126,6 @@ backend {
config {
dockerhub {
token = "base64-encoded-docker-hub-username:password"
key-name = "name/of/the/kms/key/used/for/encrypting/and/decrypting/the/docker/hub/token"
auth = "reference-to-the-auth-cromwell-should-use-for-kms-encryption"
}
}
}
Expand All @@ -140,42 +135,6 @@ backend {

`token` is the standard base64-encoded username:password for the appropriate Docker Hub account.

`key-name` is the name of the Google KMS key Cromwell should use for encrypting the Docker `token` before including it
in the PAPI job execution request. This `key-name` will also be included in the PAPI job execution
request and will be used by Batch to decrypt the Docker token used by `docker login` to enable access to the private Docker image.

`auth` is a reference to the name of an authorization in the `auths` block of Cromwell's `google` config.
Cromwell will use this authorization for encrypting the Google KMS key.

The equivalents of `key-name`, `token` and `auth` can also be specified in workflow options which take
precedence over values specified in configuration. The corresponding workflow options are named `docker_credentials_key_name`,
`docker_credentials_token`, and `user_service_account_json`. While the config value `auth` refers to an auth defined in the
`google.auths` stanza elsewhere in Cromwell's
configuration, `user_service_account_json` is expected to be a literal escaped Google service account auth JSON.
See the `User Service Account` section above for more information on using user service accounts.
If the key, token or auth value is provided in workflow options then the corresponding private Docker configuration value
is not required, and vice versa. Also note that for the `user_service_account_json` workflow option to work an auth of type `user_service_account`
must be defined in Cromwell's `google.auths` stanza; more details in the `User Service Account` section above.

Example Batch workflow options for private Docker configuration:

```
{
"docker_credentials_key_name": "name/of/the/kms/key/used/for/encrypting/and/decrypting/the/docker/hub/token",
"docker_credentials_token": "base64_username:password",
"user_service_account_json": "<properly escaped user service account JSON file>"
}
```

Important

If any of the three private Docker configuration values of key name, auth, or Docker token are missing, Batch will not perform a `docker login`.
If the Docker image to be pulled is not public the `docker pull` will fail which will cause the overall job to fail.

If using any of these private Docker workflow options it is advisable to add
them to the `workflow-options.encrypted-fields` list in Cromwell configuration.


**Monitoring**

In order to monitor metrics (CPU, Memory, Disk usage...) about the VM during Call Runtime, a workflow option can be used to specify the path to a script that will run in the background and write its output to a log file.
Expand Down Expand Up @@ -207,7 +166,7 @@ backend.providers.GCPBATCH.config {

#### Google Labels

Every call run on the Batch API backend is given certain labels by default, so that Google resources can be queried by these labels later.
Every call run on the GCP Batch backend is given certain labels by default, so that Google resources can be queried by these labels later.
The current default label set automatically applied is:

| Key | Value | Example | Notes |
Expand All @@ -217,7 +176,7 @@ The current default label set automatically applied is:
| wdl-task-name | The name of the WDL task | my-task | |
| wdl-call-alias | The alias of the WDL call that created this job | my-task-1 | Only present if the task was called with an alias. |

Any custom labels provided as '`google_labels`' in the [workflow options](../wf_options/Google) are also applied to Google resources by the Batch API.
Any custom labels provided as '`google_labels`' in the [workflow options](../wf_options/Google) are also applied to Google resources by GCP Batch.

### Virtual Private Network

Expand Down Expand Up @@ -257,12 +216,12 @@ configuration key, which is `vpc-network` here, as the name of private network a
If the network name is not present in the config Cromwell will fall back to trying to run jobs on the default network.

If the `network-name` or `subnetwork-name` values contain the string `${projectId}` then that value will be replaced
by Cromwell with the name of the project running the Batch API.
by Cromwell with the name of the project running GCP Batch.

If the `network-name` does not contain a `/` then it will be prefixed with `projects/${projectId}/global/networks/`.

Cromwell will then pass the network and subnetwork values to the Batch API. See the documentation for the
[Batch API](https://cloud.google.com/batch/docs/networking-overview)
Cromwell will then pass the network and subnetwork values to GCP Batch. See the documentation for
[GCP Batch](https://cloud.google.com/batch/docs/networking-overview)
for more information on the various formats accepted for `network` and `subnetwork`.

#### Virtual Private Network via Labels
Expand Down Expand Up @@ -306,7 +265,6 @@ network labels, and then fall back to running on the default network.

### Custom Google Cloud SDK container

[//]: # (TODO: need to test this section as well)
Cromwell can't use Google's container registry if VPC Perimeter is used in project.
Own repository can be used by adding `cloud-sdk-image-url` reference to used container:

Expand All @@ -320,8 +278,6 @@ google {

### Parallel Composite Uploads

[//]: # (TODO: Need to test parallel composite uploads)

Cromwell can be configured to use GCS parallel composite uploads which can greatly improve delocalization performance. This feature
is turned off by default but can be enabled backend-wide by specifying a `gsutil`-compatible memory specification for the key
`genomics.parallel-composite-upload-threshold` in backend configuration. This memory value represents the minimum size an output file
Expand Down Expand Up @@ -394,20 +350,18 @@ outputs. Calls which are executed and not cached will always honor the parallel
their execution.


### Migration from Google Cloud Genomics v2alpha1 to Google Cloud Life Sciences v2beta
### Migration from Google Cloud Life Sciences v2beta to Google Cloud Batch

1. If you currently run your workflows using Cloud Genomics v2beta and would like to switch to Google Batch, you will need to do a few changes to your configuration file: `actor-factory` value should be changed
1. If you currently run your workflows using Cloud Genomics v2beta and would like to switch to Google Cloud Batch, you will need to do a few changes to your configuration file: `actor-factory` value should be changed
from `cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory` to `cromwell.backend.google.batch.GcpBatchLifecycleActorFactory`.

2. You will need to remove the parameter `genomics.endpoint-url` and generate a new config file.

3. Google Batch is now available in a variety of regions. Please see the [Batch Locations](https://cloud.google.com/batch/docs/locations) for a list of supported regions
3. Google Cloud Batch is now available in a variety of regions. Please see the [Batch Locations](https://cloud.google.com/batch/docs/locations) for a list of supported regions


### Reference Disk Support

[//]: # (TODO: follow up later)

Cromwell 55 and later support mounting reference disks from prebuilt GCP disk images as an alternative to localizing large
input reference files on Batch. Please note the configuration of reference disk manifests has changed starting with
Cromwell 57 and now uses the format documented below.
Expand Down
Loading