Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5268] Apply same DAG naming conventions as in literature #5874

Merged
merged 1 commit into from
Aug 21, 2019
Merged

[AIRFLOW-5268] Apply same DAG naming conventions as in literature #5874

merged 1 commit into from
Aug 21, 2019

Conversation

BasPH
Copy link
Contributor

@BasPH BasPH commented Aug 20, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5268
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

The Airflow codebase is extremely confusing because the concept "root" node in Airflow is actually implemented as the last, finishing node of a DAG, while in DAG literature root nodes are the first nodes to execute. Or, as literature also explains it: root nodes are nodes without upstream dependencies, while it was implemented as nodes without downstream dependencies.

This PR aligns the Airflow implementation of a "root" node with DAG literature. I've also implemented a leaves property to make a clear distinction between first/starting nodes and last/finishing nodes. Also, to my surprise there weren't even tests for these basic properties, so I added 3 tests verifying the behaviour of roots and leaves.

The implementation involved:

  • Correctly implementing the definition root and leaf nodes
  • Adding tests verifying this behaviour
  • Some switching between upstream/downstream calls here and there because of the now correct behaviour

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Added 3 tests, see above.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
airflow/models/dagrun.py Outdated Show resolved Hide resolved
@BasPH
Copy link
Contributor Author

BasPH commented Aug 20, 2019

@Fokko applied all your suggestions, fingers crossed for the CI

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @BasPH

@Fokko Fokko merged commit 22aa76b into apache:master Aug 21, 2019
@BasPH BasPH deleted the bash-rectify-dag-conventions-5268 branch August 21, 2019 07:46
Jerryguo pushed a commit to Jerryguo/airflow that referenced this pull request Sep 2, 2019
kaxil pushed a commit to astronomer/airflow that referenced this pull request Oct 23, 2019
schnie pushed a commit to astronomer/airflow that referenced this pull request Oct 24, 2019
…webserver scalability (#67)

* [AIRFLOW-5088][AIP-24] Add DAG serialization using JSON (apache#5701)

It implements the method proposed in AIP-24 to serialize DAG. It will be used in DAG persistency in DB to solve webserver scalability issue.

(cherry picked from commit 2bd1a51ec75f680a6e6e2101bd948a78421a644a)

* [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability (apache#5743)

* Make _primitive_types Py2 & Py3 compatible

(cherry picked from commit 7ce34b2a959fc1f8322836f38f474a831e4901a1)

* Fix issue with different class for Pendulum Timezones

(cherry picked from commit c068c67c48d294a589b58be0d0ad8b657c361a77)
(cherry picked from commit 04fbf2beac57dcf26b118ebbe5a2bf175ce08af8)

* Update timezone class & Do not serialize dates in tasks if they have matching date in DAG

(cherry picked from commit be412522cb95a19a51b2f208ae8ebea76e8b667a)

* Change type of data column to JSON & Add metric for dagbag size

(cherry picked from commit d030b10bec9cd0e468f36e97e131d497d5a43fc6)

* Code Cleanup for JSON columns

- Code Cleanup for JSON columns
- Test code to allow old mysql & sqlite versions

(cherry picked from commit 1db8044f9d29edf25f2b8ad4cd21c496c243534a)

* Add Debug info

(cherry picked from commit d14497ff28d123d45d626019cabcbd977c5de79d)

* Reduce Sizing of SerializedDAG

* Support dateutil.relativedelta in SerializedDAGs

This was a valid type for schedule_interval already, so we should
continue supporting it

(cherry picked from commit ec9d705f1a90790bdcb099196269c77d3cc3d53c)
(cherry picked from commit 9805b4a183b87976dc33ae80c7e6a209849ba5d7)
(cherry picked from commit f00d9237cd9224571e43bda67ad4dddfb009c402)

* Add specific test for schedule_interval serialization

* Delete non-existent Dags

(cherry picked from commit 92d442d33dd8c81ea73026405d3978d133140807)
(cherry picked from commit 7d371d329613c48deef0d8a812c817f2013db8f9)

* Remove comment

(cherry picked from commit 549c1f9cd9ab0bfeac4f75fa713cbaae842a6e82)

* fix bugs that date/time/IntEnum are not supported in serialization.

(cherry picked from commit d0ce27e3f3b6046016800855ad2e57fa67d8b57f)
(cherry picked from commit 50a60b6a026e6d6249f069944be86560d87a67ca)

* Deactivate DAGs instead of deleting if their DAG file is deleted

(cherry picked from commit 5a84ca517cef0dacff23f57b360a554b461b5034)

* Fix CI

(cherry picked from commit 712ff47cbada7373eaa4fa92bb9220a453c445ae)
(cherry picked from commit 52a0e9e39dc006501eb9d8ac0881357900548cf7)

* Just-in-time loading of DagBag in webserver

To save start-up time (and memory) this changes the DabBag to not be
populated by the webserver on start up - and when a specific dag is
asked for it will be loaded on-demand from the SerializedDAG table.

Co-Authored-By: Ash Berlin-Taylor <[email protected]>

* Fix flake8

(cherry picked from commit e91ad24b006823eadd6f3e21fc7cc5c8dd57b0d1)

* Add default args to decorated_fields

(cherry picked from commit 3f08d2f986364315c3e43bde3524f12d069392ae)

* [AIRFLOW-5636] Allow adding or overriding existing Operator Links (apache#6302)

(cherry-picked from 1de210b)

* Add support for OperatorLinks

ExtraOperatorLinks are supported if Plugins are registered for them

(cherry picked from commit 9cb6e28)
(cherry picked from commit 72c75860ecfcd1930f1dedc7a0c713f122ea51a5)

* Cleanup

(cherry picked from commit e840616)
(cherry picked from commit 6d01d8e5bac1b6e829b9da6fc50c1a4b6d23bcaf)

* Move serialization directory out of dags folder

(cherry picked from commit 8a07aee3e5cf133c45ee4ae26aad6104c84502ab)

* Update path

* [AIRFLOW-5268] Apply same DAG naming conventions as in literature (apache#5874)

cherry-picked from apache@8f6ca53

* [AIRFLOW-4309] Remove Broken Dag error after Dag is deleted (apache#6102)

(cherry picked from commit 3140c45)
(cherry picked from commit df65f8e)

* [AIRFLOW-5481] Allow Deleting Renamed DAGs (apache#6101)

(cherry picked from commit 99a5c2e)

* Fix bad merge_conflict resolution

This was incorrectly removed while cherry-picking and resolving conflicts

* Add test for relativedelta

* Fix Import

* Backport for Py2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants