Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5088][AIP-24][BackPort] Persisting serialized DAG in DB for webserver scalability #5992

Merged
merged 24 commits into from
Oct 25, 2019

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Sep 3, 2019

Make sure you have checked all steps below.

Jira

Description

The goal is to decouple webserver from the DAG folder, instead it reads everything from database.

Rendering template by functions is an exception, in that case it needs to re-import DAG, because functions are stringified in serialized DAG.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

cc @coufon

@codecov-io
Copy link

codecov-io commented Oct 3, 2019

Codecov Report

Merging #5992 into v1-10-test will increase coverage by 2.57%.
The diff coverage is 79.27%.

Impacted file tree graph

@@              Coverage Diff               @@
##           v1-10-test    #5992      +/-   ##
==============================================
+ Coverage       76.79%   79.36%   +2.57%     
==============================================
  Files             511      518       +7     
  Lines           34734    35238     +504     
==============================================
+ Hits            26673    27966    +1293     
+ Misses           8061     7272     -789
Impacted Files Coverage Δ
airflow/serialization/__init__.py 100% <100%> (ø)
airflow/models/baseoperator.py 94.77% <100%> (+0.28%) ⬆️
airflow/settings.py 85.71% <100%> (+0.15%) ⬆️
airflow/models/__init__.py 100% <100%> (ø) ⬆️
airflow/serialization/enums.py 100% <100%> (ø)
airflow/utils/dag_processing.py 58.69% <16.66%> (-0.25%) ⬇️
airflow/www/views.py 45.01% <27.27%> (-0.1%) ⬇️
airflow/models/dagbag.py 84.95% <34.37%> (-7.21%) ⬇️
airflow/www/utils.py 82.65% <60%> (-0.69%) ⬇️
airflow/models/dag.py 92.07% <80%> (-0.91%) ⬇️
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ea227f...afa60a5. Read the comment docs.

@kaxil
Copy link
Member Author

kaxil commented Oct 3, 2019

ToDo:

  • Graph View & Tree View are broken currently because [AIRFLOW-5268] Apply same DAG naming conventions as in literature #5874 has not been merged to v1-10-* branch yet (The Graph View & Tree View uses _upstream_tasks to create an edge between tasks), so either we need to include _upstream_tasks attribute to the serialized JSON or we need to backport that PR in v1-10-* branches. - Backported that PR in 9b5fa1abb3970213c0ba68a0b212fc6dfdbad4db

ashb and others added 13 commits October 25, 2019 21:54
This was a valid type for schedule_interval already, so we should
continue supporting it

(cherry picked from commit ec9d705f1a90790bdcb099196269c77d3cc3d53c)
(cherry picked from commit 9805b4a183b87976dc33ae80c7e6a209849ba5d7)
(cherry picked from commit 92d442d33dd8c81ea73026405d3978d133140807)
(cherry picked from commit d0ce27e3f3b6046016800855ad2e57fa67d8b57f)
(cherry picked from commit 712ff47cbada7373eaa4fa92bb9220a453c445ae)
To save start-up time (and memory) this changes the DabBag to not be
populated by the webserver on start up - and when a specific dag is
asked for it will be loaded on-demand from the SerializedDAG table.

Co-Authored-By: Ash Berlin-Taylor <[email protected]>
ExtraOperatorLinks are supported if Plugins are registered for them

(cherry picked from commit 9cb6e28)
(cherry picked from commit e840616)
@kaxil kaxil merged this pull request into apache:v1-10-test Oct 25, 2019
@kaxil kaxil deleted the dag-serialization-1-10 branch October 25, 2019 22:13
kaxil added a commit that referenced this pull request Oct 25, 2019
…scalability (#5992)

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-Authored-By: Zhou Fang <[email protected]>
@potiuk
Copy link
Member

potiuk commented Oct 26, 2019

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

kaxil added a commit to astronomer/airflow that referenced this pull request Oct 30, 2019
potiuk pushed a commit that referenced this pull request Nov 12, 2019
…scalability (#5992)

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-Authored-By: Zhou Fang <[email protected]>
eladkal pushed a commit to eladkal/airflow that referenced this pull request Dec 2, 2019
…scalability (apache#5992)

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-Authored-By: Zhou Fang <[email protected]>
kaxil added a commit that referenced this pull request Dec 12, 2019
…scalability (#5992)

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-Authored-By: Zhou Fang <[email protected]>
@tooptoop4
Copy link
Contributor

cfg points to broken link https://airflow.apache.org/howto/enable-dag-serialization.html

@kaxil
Copy link
Member Author

kaxil commented Dec 24, 2019 via email

@kaxil
Copy link
Member Author

kaxil commented Dec 24, 2019 via email

@tooptoop4
Copy link
Contributor

tooptoop4 commented Dec 26, 2019

for a .py that generates multi dynamic dags how can parsing the dag be avoided? @kaxil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants