Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6043] Update dag reference to all tasks in sub_dag #6638

Merged
merged 1 commit into from
Dec 2, 2019
Merged

[AIRFLOW-6043] Update dag reference to all tasks in sub_dag #6638

merged 1 commit into from
Dec 2, 2019

Conversation

Khrol
Copy link
Contributor

@Khrol Khrol commented Nov 22, 2019

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The origin of the problem is in sub_dag method that keeps references
to the original DAG. deepcopy call is optimized and the optimization is
not 100% correct.

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Original issue:
t1 = DummyOperator(task_id="t1", dag=DAG)
t2 = DummyOperator(task_id="t2", dag=DAG)
t3 = DummyOperator(task_id="t3", dag=DAG)
t1 >> t2
t2 >> t3

Screen Shot 2019-11-22 at 17 04 00

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The initial of the problem is in `sub_dag` method that keeps references
to the original DAG. `deepcopy` call is optimized and the optimization is
not 100% correct.
@Khrol Khrol marked this pull request as ready for review November 22, 2019 16:25
@codecov-io
Copy link

Codecov Report

Merging #6638 into master will decrease coverage by 0.24%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6638      +/-   ##
==========================================
- Coverage   83.71%   83.46%   -0.25%     
==========================================
  Files         672      672              
  Lines       37568    37566       -2     
==========================================
- Hits        31450    31355      -95     
- Misses       6118     6211      +93
Impacted Files Coverage Δ
airflow/models/dag.py 90.85% <100%> (ø) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 76.38% <0%> (-22.23%) ⬇️
airflow/configuration.py 89.13% <0%> (-3.63%) ⬇️
airflow/models/taskinstance.py 93.46% <0%> (+0.14%) ⬆️
airflow/hooks/postgres_hook.py 94.36% <0%> (+1.4%) ⬆️
airflow/hooks/dbapi_hook.py 91.52% <0%> (+1.69%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e5685a...8256ca4. Read the comment docs.

t1 >> t2
t2 >> t3

sub_dag = dag.sub_dag('t2', include_upstream=True, include_downstream=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense? This sub_dag is an odd beast that is similarly named but subtly different to an actual subdag via the SubDag operator.

When is this code path exercised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not related to SubDag operator at all and is about completely different thing. Naming is very confusing, true.

This sub_dag method is called from
https://github.com/apache/airflow/blob/master/airflow/www/views.py#L1349 when Graph UI is filtered by root. It takes some mental effort to understand what happens in def graph to render corresponding view but the logic there is fine. The problem is in the incorrect deep_copy optimizations in sub_dag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when is "filter by root" passed in from the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://youtu.be/hNzwsutZX1E

This is very useful for big DAGs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I NEVER KNEW THAT FEATURE EXISTED!!!

@ashb ashb merged commit 6fbe683 into apache:master Dec 2, 2019
@Khrol Khrol deleted the AIRFLOW-6043_sub_dag_fix branch December 2, 2019 12:58
kaxil pushed a commit that referenced this pull request Dec 17, 2019
…ion of dag (#6638)

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The initial of the problem is in `sub_dag` method that keeps references
to the original DAG. `deepcopy` call is optimized and the optimization is
not 100% correct.

(cherry picked from commit 6fbe683)
kaxil pushed a commit that referenced this pull request Dec 18, 2019
…ion of dag (#6638)

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The initial of the problem is in `sub_dag` method that keeps references
to the original DAG. `deepcopy` call is optimized and the optimization is
not 100% correct.

(cherry picked from commit 6fbe683)
ashb pushed a commit that referenced this pull request Dec 19, 2019
…ion of dag (#6638)

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The initial of the problem is in `sub_dag` method that keeps references
to the original DAG. `deepcopy` call is optimized and the optimization is
not 100% correct.

(cherry picked from commit 6fbe683)
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…ion of dag (apache#6638)

Graph UI is broken when tasks are filtered by root if
root is in the middle of chain of tasks.

The initial of the problem is in `sub_dag` method that keeps references
to the original DAG. `deepcopy` call is optimized and the optimization is
not 100% correct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants