Skip to content

Commit

Permalink
feat(components): add strategy and max_wait_duration to v1 GCPC custo…
Browse files Browse the repository at this point in the history
…m job components/utils

Signed-off-by: Ze Mao <[email protected]>
PiperOrigin-RevId: 684613251
  • Loading branch information
Ze Mao authored and Google Cloud Pipeline Components maintainers committed Oct 14, 2024
1 parent 4ccb047 commit 03095f1
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 0 deletions.
1 change: 1 addition & 0 deletions components/google-cloud/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
## Upcoming release
* Remove default prediction column names in `v1.model_evaluation.regression_component` component to fix pipeline errors when using bigquery data source.
* add strategy and max_wait_duration to v1 GCPC custom job components/utils

## Release 2.17.0
* Fix Gemini batch prediction support to `v1.model_evaluation.autosxs_pipeline` after output schema change.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ def custom_training_job(
worker_pool_specs: List[Dict[str, str]] = [],
timeout: str = '604800s',
restart_job_on_worker_restart: bool = False,
strategy: str = 'STANDARD',
max_wait_duration: str = '86400s',
service_account: str = '',
tensorboard: str = '',
enable_web_access: bool = False,
Expand All @@ -48,6 +50,8 @@ def custom_training_job(
worker_pool_specs: Serialized json spec of the worker pools including machine type and Docker image. All worker pools except the first one are optional and can be skipped by providing an empty value. See [more information](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#WorkerPoolSpec).
timeout: The maximum job running time. The default is 7 days. A duration in seconds with up to nine fractional digits, terminated by 's', for example: "3.5s".
restart_job_on_worker_restart: Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
startegy: The strategy to use for the custom training job. The default is 'STANDARD'. See [more information](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#Strategy).
max_wait_duration: The maximum duration to wait for the job to complete. The default is 24 hours. See [more information](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#Strategy).
service_account: Sets the default service account for workload run-as account. The [service account ](https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account) running the pipeline submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code [Service Agent ](https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) for the CustomJob's project.
tensorboard: The name of a Vertex AI TensorBoard resource to which this CustomJob will upload TensorBoard logs.
enable_web_access: Whether you want Vertex AI to enable [interactive shell access ](https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell) to training containers. If `True`, you can access interactive shells at the URIs given by [CustomJob.web_access_uris][].
Expand Down Expand Up @@ -75,6 +79,8 @@ def custom_training_job(
'restart_job_on_worker_restart': (
restart_job_on_worker_restart
),
'strategy': strategy,
'max_wait_duration': max_wait_duration,
},
'service_account': service_account,
'tensorboard': tensorboard,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ def create_custom_training_job_from_component(
boot_disk_size_gb: int = 100,
timeout: str = '604800s',
restart_job_on_worker_restart: bool = False,
strategy: str = 'STANDARD',
max_wait_duration: str = '86400s',
service_account: str = '',
network: str = '',
encryption_spec_key_name: str = '',
Expand Down Expand Up @@ -88,6 +90,8 @@ def create_custom_training_job_from_component(
boot_disk_size_gb: Size in GB of the boot disk (default is 100GB). `boot_disk_size_gb` is set as a static value and cannot be changed as a pipeline parameter.
timeout: The maximum job running time. The default is 7 days. A duration in seconds with up to nine fractional digits, terminated by 's', for example: "3.5s".
restart_job_on_worker_restart: Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
startegy: The strategy to use for the custom training job. The default is 'STANDARD'. See [more information](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#Strategy).
max_wait_duration: The maximum duration to wait for the job to complete. The default is 24 hours. See [more information](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#Strategy).
service_account: Sets the default service account for workload run-as account. The [service account](https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#service-account) running the pipeline submitting jobs must have act-as permission on this run-as account. If unspecified, the Vertex AI Custom Code [Service Agent](https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents) for the CustomJob's project.
network: The full name of the Compute Engine network to which the job should be peered. For example, `projects/12345/global/networks/myVPC`. Format is of the form `projects/{project}/global/networks/{network}`. Where `{project}` is a project number, as in `12345`, and `{network}` is a network name. Private services access must already be configured for the network. If left unspecified, the job is not peered with any network.
encryption_spec_key_name: Customer-managed encryption key options for the CustomJob. If this is set, then all resources created by the CustomJob will be encrypted with the provided encryption key.
Expand Down Expand Up @@ -194,6 +198,8 @@ def create_custom_training_job_from_component(
'worker_pool_specs': worker_pool_specs,
'timeout': timeout,
'restart_job_on_worker_restart': restart_job_on_worker_restart,
'strategy': strategy,
'max_wait_duration': max_wait_duration,
'service_account': service_account,
'tensorboard': tensorboard,
'enable_web_access': enable_web_access,
Expand Down

0 comments on commit 03095f1

Please sign in to comment.