Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

vignesh-sc · 2024-06-28T10:03:02Z

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

10.18.0

Apache Airflow version

2.8.4

Operating System

Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

When using the CreateCustomContainerTrainingJobOperator without exporting a managed model, the operator failed when fetching the result with the following error:

  File "/opt/python3.11/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 1629, in resume_execution
    return execute_callable(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/operators/vertex_ai/custom_job.py", line 574, in execute_complete
    model_id = self.hook.extract_model_id_from_training_pipeline(result)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/hooks/vertex_ai/custom_job.py", line 266, in extract_model_id_from_training_pipeline
    return training_pipeline["model_to_upload"]["name"].rpartition("/")[-1]

On further exploration, this seems to be because the execute_complete function assumes the model parameter is returned. The execute function actually checks if a model is returned or not.

What you think should happen instead

Wether the operator is deferrable mode or not, it should check if a model is exported or not

How to reproduce

In order to reproduce:

Create a task using CreateCustomContainerTrainingJobOperator in deferrable mode
Dont set the model parameters
Execute the task

Anything else

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2024-06-28T10:03:05Z

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

eladkal · 2024-06-28T13:31:52Z

cc @e-galan @VladaZakharova can you take a look?

e-galan · 2024-06-28T13:41:45Z

Hi @eladkal , I will check it

e-galan · 2024-07-10T10:24:09Z

Hi @vignesh-sc , I have just submitted a community PR to address the issue. You are welcome to check it.

vignesh-sc added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jun 28, 2024

eladkal added provider:google Google (including GCP) related issues good first issue and removed needs-triage label for new issues that we didn't triage yet labels Jun 28, 2024

eladkal assigned e-galan Jun 28, 2024

e-galan mentioned this issue Jul 3, 2024

Fix Custom Training Job operators to accept results without a model VladaZakharova/airflow#72

Closed

e-galan mentioned this issue Jul 10, 2024

Fix Custom Training Job operators to accept results without a model in def mode #40685

Merged

potiuk closed this as completed in #40685 Jul 13, 2024

This was referenced Jul 28, 2024

Status of testing Providers that were prepared on July 28, 2024 #41080

Closed

Status of testing Providers that were prepared on August 03, 2024 #41237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

vignesh-sc commented Jun 28, 2024 •

edited

Loading

boring-cyborg bot commented Jun 28, 2024

eladkal commented Jun 28, 2024

e-galan commented Jun 28, 2024

e-galan commented Jul 10, 2024

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

Comments

vignesh-sc commented Jun 28, 2024 • edited Loading

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

boring-cyborg bot commented Jun 28, 2024

eladkal commented Jun 28, 2024

e-galan commented Jun 28, 2024

e-galan commented Jul 10, 2024

vignesh-sc commented Jun 28, 2024 •

edited

Loading