Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6495] Load DAG only once when running a task using StandardTaskRunner #7090

Merged
merged 1 commit into from
Jan 9, 2020

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Jan 7, 2020

If the process is created using a fork, it will share all loaded modules that were available before forking. This allows us to avoid loading DAG again. Sometimes the DAG loading process is costly because it do database queries or other costly operation, so we should limit it if possible.


Issue link: AIRFLOW-6495

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 7, 2020
@codecov-io
Copy link

codecov-io commented Jan 7, 2020

Codecov Report

Merging #7090 into master will decrease coverage by 0.28%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7090      +/-   ##
==========================================
- Coverage   85.07%   84.79%   -0.29%     
==========================================
  Files         680      680              
  Lines       38810    38811       +1     
==========================================
- Hits        33019    32910     -109     
- Misses       5791     5901     +110
Impacted Files Coverage Δ
airflow/task/task_runner/standard_task_runner.py 67.21% <50%> (+0.54%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 78.75% <0%> (-20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 748e3c3...6e56595. Read the comment docs.

@mik-laj mik-laj merged commit 573867c into apache:master Jan 9, 2020
kaxil pushed a commit that referenced this pull request Feb 3, 2020
kaxil pushed a commit to astronomer/airflow that referenced this pull request Feb 27, 2020
…askRunner (apache#7090)

(cherry picked from commit 573867c)
(cherry picked from commit b9846e4)
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
@turbaszek
Copy link
Member

@mik-laj I'm afraid this won't work with --pickle flag:

if dag and args.pickle:
raise AirflowException("You cannot use the --pickle option when using DAG.cli() method.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants