Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4453] Make behavior of none_failed consistent with documentation #7464

Merged
merged 8 commits into from
Mar 26, 2020

Conversation

yuqian90
Copy link
Contributor

@yuqian90 yuqian90 commented Feb 19, 2020

This is to pick up the work started in #7098. I discussed with @TV4Fun and opened this new PR to continue the work.

  • Make behavior of none_failed consistent with documentation. Do not skip itself when all parents are skipped.
  • Add none_failed_or_skipped which behaves like the old none_failed trigger rule (i.e. all parents are success or skipped and at least one parent is success). One use case of none_failed_or_skipped is the join task in a workflow with nested BranchPythonOperator

Issue link: AIRFLOW-4453

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@yuqian90
Copy link
Contributor Author

@TV4Fun as discussed, this is the new PR. Please close the old one. Thank you!

airflow/operators/python.py Outdated Show resolved Hide resolved
Comment on lines 181 to 185
with create_session() as session:
branch_ti = session.query(TaskInstance).filter(
TaskInstance.dag_id == ti.dag_id,
TaskInstance.task_id == branch_operator.task_id,
TaskInstance.execution_date == ti.execution_date
).one_or_none()

if not branch_ti:
return

if branch_ti.state == State.SKIPPED:
raise AirflowSkipException(f"Skipping because parent task {branch_operator.task_id} "
"is skipped.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task/ti shouldn't be setting state in this manner but not sure what other option we have over here !

Would love to know opinions from other committers @potiuk @mik-laj @ashb @feluelle @BasPH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaxil. This is raising AirflowSkipException to cause the join task itself to be skipped. Please let me know if there are better ways to do this.

For other reviewers who haven't seen the previous discussion on the ancestor PR. What this create_branch_join does here is to make a join task like join_2 in the following DAG. It makes sure the workflow can still work as we intuitively expect after the behaviour or none_failed is fixed by this PR. Logically, join_2 should be skipped because its parent BranchPythonOperator branch_2 has been skipped. Without create_branch_join, the none_failed task would actually become success.

nested_branching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kaxil I've done the renaming you suggested to create_branch_join_task().

Any suggestion about how to make join_2 skipped other than raising AirflowSkipException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of having 2 trigger rules?

  1. Fixed "none_failed"
  2. none_failed_or_skipped

(2) can be used to get current "non_failed" like feature, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaxil sorry for the late reply. Yes I think your suggestion sounds good. I'll think a bit more and work on fixing none_failed, and adding none_failed_or_skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil I have done the following. Could you take a second look? There are some elastic search related tests failing in Travis. Does not look related to my change.

  1. Fixed "none_failed"
  2. none_failed_or_skipped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will take a look today, can you rebase on the latest master please to fix the Elasticsearch issue

@yustoris
Copy link

yustoris commented Mar 8, 2020

Any progress here?

@yuqian90 yuqian90 force-pushed the investigate_none_failed branch 2 times, most recently from e028a29 to 4284078 Compare March 21, 2020 16:54
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot better, I like this change

@kaxil
Copy link
Member

kaxil commented Mar 24, 2020

Can you rebase to latest master to fix the CI

root and others added 8 commits March 25, 2020 08:00
…tation

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.
@codecov-io
Copy link

codecov-io commented Mar 25, 2020

Codecov Report

Merging #7464 into master will increase coverage by 21.44%.
The diff coverage is 96.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #7464       +/-   ##
===========================================
+ Coverage   65.44%   86.88%   +21.44%     
===========================================
  Files         927      928        +1     
  Lines       44963    44988       +25     
===========================================
+ Hits        29425    39087     +9662     
+ Misses      15538     5901     -9637     
Impacted Files Coverage Δ
airflow/example_dags/example_branch_operator.py 100.00% <ø> (ø)
airflow/models/baseoperator.py 95.68% <ø> (+28.13%) ⬆️
airflow/ti_deps/deps/trigger_rule_dep.py 94.04% <85.71%> (+73.26%) ⬆️
airflow/example_dags/example_nested_branch_dag.py 100.00% <100.00%> (ø)
airflow/utils/trigger_rule.py 100.00% <100.00%> (ø)
airflow/providers/google/cloud/operators/vision.py 99.63% <0.00%> (+0.36%) ⬆️
...osoft/azure/operators/azure_container_instances.py 78.28% <0.00%> (+0.65%) ⬆️
airflow/cli/cli_parser.py 98.13% <0.00%> (+0.93%) ⬆️
...rflow/providers/databricks/operators/databricks.py 92.24% <0.00%> (+1.72%) ⬆️
... and 492 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a15026f...74f565d. Read the comment docs.

@yuqian90 yuqian90 closed this Mar 25, 2020
@yuqian90 yuqian90 reopened this Mar 25, 2020
@kaxil kaxil merged commit f0c65e3 into apache:master Mar 26, 2020
kaxil pushed a commit that referenced this pull request Mar 27, 2020
…tation (#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>
(cherry picked from commit f0c65e3)
kaxil pushed a commit that referenced this pull request Mar 27, 2020
…tation (#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>
(cherry picked from commit f0c65e3)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Mar 30, 2020
…tation (apache#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>
(cherry picked from commit f0c65e3)
kaxil pushed a commit that referenced this pull request Mar 30, 2020
…tation (#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>
(cherry picked from commit f0c65e3)
@yuqian90 yuqian90 deleted the investigate_none_failed branch September 16, 2020 02:18
ArgentFalcon pushed a commit to lyft/airflow that referenced this pull request Nov 17, 2020
…tation (apache#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>
ArgentFalcon added a commit to lyft/airflow that referenced this pull request Nov 18, 2020
…tation (apache#7464) (#235)

* [AIRFLOW-4453] Make behavior of `none_failed` consistent with documentation (apache#7464)

The documentation for the `none_failed` trigger rule
(https://airflow.apache.org/docs/stable/concepts.html#trigger-rules)
describes its behavior as "all parents have not failed (`failed` or
`upstream_failed`) i.e. all parents have succeeded or been skipped."
With that definition in mind, there is no reason that the check for
`none_failed` should ever have to check for skipped upstream tasks or
set the current task to `skipped`. The current behavior causes the rule
to skip if all upstream tasks have skipped, which is more than a little
confusing. This fixes the behavior to be consistent with the documentation.

Co-authored-by: root <[email protected]>

* Fix tests

Co-authored-by: yuqian90 <[email protected]>
Co-authored-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants