Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

Closed
1 of 2 tasks
vignesh-sc opened this issue Jun 28, 2024 · 4 comments · Fixed by #40685
Closed
1 of 2 tasks

Vertex AI job operators in deferrable mode assume job uses Managed Model #40476

vignesh-sc opened this issue Jun 28, 2024 · 4 comments · Fixed by #40685
Assignees
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@vignesh-sc
Copy link

vignesh-sc commented Jun 28, 2024

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

10.18.0

Apache Airflow version

2.8.4

Operating System

Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

When using the CreateCustomContainerTrainingJobOperator without exporting a managed model, the operator failed when fetching the result with the following error:

  File "/opt/python3.11/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 1629, in resume_execution
    return execute_callable(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/operators/vertex_ai/custom_job.py", line 574, in execute_complete
    model_id = self.hook.extract_model_id_from_training_pipeline(result)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/hooks/vertex_ai/custom_job.py", line 266, in extract_model_id_from_training_pipeline
    return training_pipeline["model_to_upload"]["name"].rpartition("/")[-1]

On further exploration, this seems to be because the execute_complete function assumes the model parameter is returned. The execute function actually checks if a model is returned or not.

What you think should happen instead

Wether the operator is deferrable mode or not, it should check if a model is exported or not

How to reproduce

In order to reproduce:

  1. Create a task using CreateCustomContainerTrainingJobOperator in deferrable mode
  2. Dont set the model parameters
  3. Execute the task

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@vignesh-sc vignesh-sc added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jun 28, 2024
Copy link

boring-cyborg bot commented Jun 28, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@eladkal
Copy link
Contributor

eladkal commented Jun 28, 2024

cc @e-galan @VladaZakharova can you take a look?

@eladkal eladkal added provider:google Google (including GCP) related issues good first issue and removed needs-triage label for new issues that we didn't triage yet labels Jun 28, 2024
@e-galan
Copy link
Contributor

e-galan commented Jun 28, 2024

Hi @eladkal , I will check it

@e-galan
Copy link
Contributor

e-galan commented Jul 10, 2024

Hi @vignesh-sc , I have just submitted a community PR to address the issue. You are welcome to check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
3 participants