Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't schedule dummy tasks #7880

Merged
merged 3 commits into from
Mar 26, 2020
Merged

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Mar 26, 2020

#7871


Issue link: WILL BE INSERTED BY boring-cyborg

Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Mar 26, 2020
@codecov-io
Copy link

codecov-io commented Mar 26, 2020

Codecov Report

Merging #7880 into master will decrease coverage by 0.71%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7880      +/-   ##
==========================================
- Coverage   86.95%   86.23%   -0.72%     
==========================================
  Files         927      927              
  Lines       44974    44977       +3     
==========================================
- Hits        39107    38788     -319     
- Misses       5867     6189     +322     
Impacted Files Coverage Δ
airflow/operators/dummy_operator.py 100.00% <ø> (ø)
airflow/jobs/scheduler_job.py 90.92% <100.00%> (+0.47%) ⬆️
...flow/providers/apache/cassandra/hooks/cassandra.py 21.25% <0.00%> (-72.50%) ⬇️
...w/providers/apache/hive/operators/mysql_to_hive.py 35.84% <0.00%> (-64.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0.00%> (-55.56%) ⬇️
airflow/providers/postgres/operators/postgres.py 47.82% <0.00%> (-52.18%) ⬇️
airflow/providers/redis/operators/redis_publish.py 50.00% <0.00%> (-50.00%) ⬇️
airflow/kubernetes/volume.py 52.94% <0.00%> (-47.06%) ⬇️
airflow/providers/mongo/sensors/mongo.py 53.33% <0.00%> (-46.67%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0.00%> (-45.08%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6fd74e...d851a2f. Read the comment docs.

@eladkal
Copy link
Contributor

eladkal commented Mar 26, 2020

I think it's a good idea that DummyOperator won't be scheduled but I don't agree with the statement "DummyOperator doesn't actually do work".

For example:
If the Dummy is marked always as success what will happen to cases where DummyOperator is used with trigger rules for some flows ?

Example:

final = DummyOperator(
                              dag=dag,
                              task_id="error_check",
                              trigger_rule=TriggerRule.ONE_FAILED
                            )
[task1, task2, task3] >> final

In this case marking final as FAILED result in failure of the DAG which invoke on_failure_callback.

Also it's possible in the above example that there is ExternalSensor from dag2 on final so it's important that final won't be marked as SUCCESS automatically.

Probably just need to update Updating.md alerting change of behavior as DummyOperator will not longer honor trigger_rule. Maybe even not allowing users to overwrite the default trigger_rule ?

@mik-laj
Copy link
Member Author

mik-laj commented Mar 26, 2020

But the trigger rules will still be respected. I added this code when the task is to be marked for execution by the scheduler and the trigger rules have already been checked. All callbacks will also be executed, because when the task has callbacks, it will still be passed to the scheduler.

This optimization applies only to cases when the task has no callbacks and when the decision to execute it has been made.

@@ -24,6 +24,9 @@ class DummyOperator(BaseOperator):
"""
Operator that does literally nothing. It can be used to group tasks in a
DAG.

The task that uses this operator is automatically marked as successfully completed by the scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj This is what confused me.
Maybe better saying something like:
The task is evaluated by the scheduler but never processed by the executor
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Might be worth adding!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small update to description needed but otherwise good.

@@ -24,6 +24,9 @@ class DummyOperator(BaseOperator):
"""
Operator that does literally nothing. It can be used to group tasks in a
DAG.

The task that uses this operator is automatically marked as successfully completed by the scheduler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Might be worth adding!

@mik-laj mik-laj merged commit d87c59d into apache:master Mar 26, 2020
@mik-laj mik-laj changed the title Don't schedule dummy task Don't schedule dummy tasks Mar 26, 2020
@ashb ashb added this to the Airflow 1.10.10 milestone Mar 26, 2020
kaxil pushed a commit that referenced this pull request Mar 27, 2020
(cherry picked from commit d87c59d)
kaxil pushed a commit that referenced this pull request Mar 27, 2020
(cherry picked from commit d87c59d)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Mar 30, 2020
kaxil pushed a commit that referenced this pull request Mar 30, 2020
(cherry picked from commit d87c59d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants