Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to wait for completion on the EmrCreateJobFlowOperator #28827

Merged
merged 5 commits into from
Jan 30, 2023

Conversation

BasPH
Copy link
Contributor

@BasPH BasPH commented Jan 10, 2023

This PR adds the option to wait for completion with the EmrCreateJobFlowOperator. It includes:

  • The implementation of a waiter to wait for WAITING or TERMINATED state.
  • To avoid breaking changes, the default is set to wait_for_completion=False. This matches the current behavior and will start a job flow and immediately succeed the Airflow task.
  • A few consistency fixes here and there. AWS writes "job flow" (lowercase) so I adhered to that.
  • Also implemented an on_kill method. Moved EMR hook creation & caching to a cached_property because the hook is called from multiple methods.
  • I don't think a default max. runtime of 25 minutes is desirable, I've had many projects which ran EMR jobs longer than that. I allowed None for the waiter.countdown argument type to represent waiting for infinity (or until the Airflow task times out based on execution_timeout). Defaulted the EmrCreateJobFlowOperator.waiter_countdown to None. Checked all other usages of waiter_countdown and they all default to 25 * 60 so no breaking changes.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@BasPH BasPH requested a review from eladkal as a code owner January 10, 2023 12:28
@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jan 10, 2023
airflow/providers/amazon/aws/utils/waiter.py Show resolved Hide resolved
Comment on lines +597 to +598
waiter(
get_state_callable=self._emr_hook.get_conn().describe_cluster,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see someone using this already to create new customer waiters! 🤩

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this isn't the waiter setup I thought it was originally (thanks @ferruzzi for pointing that out!). You can find details on the new custom waiters here. Though I'm actually happy to merge this PR with the waiter you used, and then move all of the EMR waiters to the new waiter system in another PR rather than scope creeping this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize you could actually implement custom waiters that way, hence my comment above: https://github.com/BasPH/airflow/blob/add-emrcreatejobflow-waitforcompletion/airflow/providers/amazon/aws/operators/emr.py#L595-L596.

I'll take a look.

@@ -169,6 +169,15 @@ def add_job_flow_steps(
)
return response["StepIds"]

def terminate_job_flow(self, job_flow_id: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid functions in hooks which just wrap boto3 api. You can call the boto3 api directly from the operator

"""Terminate job flow."""
if self._job_flow_id:
self.log.info("Terminating job flow %s", self._job_flow_id)
self._emr_hook.terminate_job_flow(self._job_flow_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._emr_hook.terminate_job_flow(self._job_flow_id)
self._emr_hook.get_conn().terminate_job_flows(JobFlowIds=[job_flow_id])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @BasPH, thoughts on this suggested change? Otherwise the PR looks good

@o-nikolas
Copy link
Contributor

There hasn't been much movement on this one. The remaining comments are minor and we can always circle back on them. I'm going to merge this as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants