Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6897] Simplify DagFileProcessorManager #7521

Merged
merged 6 commits into from
Feb 24, 2020

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Feb 24, 2020

Extract start_new_processes and prepare_file_path_queue
These functions are independent of each other, but as separate methods, they are easier to understand. A good methods name is the best documentation.
I also changed to _heartbeat_count to _no_run because it better describes the role. This variable can increase without affecting heartbeat when DagFileProcessorManager is running in asynchronous mode.q
Move emit_metrics call
It is not related to generating file_path_queue, so it should be called by the caller.
Inline heartbeat method
This function does not make sense, because every time a full loop is performed. It only hides the logic and makes it difficult to understand the code.
Move _kill_timed_out_processors to loop
It is not related to generating the collect_results method, so it should be called by the caller.

Now I have the impression that the sequence of operations is much more easy to understand because it is only found in the start method.

In this PR I do not want to change the sequence of operations. It is only refactoring without functional changes.


Issue link: AIRFLOW-6897

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@codecov-io
Copy link

codecov-io commented Feb 24, 2020

Codecov Report

Merging #7521 into master will decrease coverage by 0.88%.
The diff coverage is 88.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7521      +/-   ##
==========================================
- Coverage   86.81%   85.92%   -0.89%     
==========================================
  Files         893      893              
  Lines       42193    42229      +36     
==========================================
- Hits        36629    36285     -344     
- Misses       5564     5944     +380
Impacted Files Coverage Δ
airflow/bin/cli.py 94.73% <ø> (ø) ⬆️
airflow/cli/commands/dag_command.py 84.71% <100%> (ø) ⬆️
airflow/providers/amazon/aws/operators/sns.py 100% <100%> (ø) ⬆️
airflow/models/dagrun.py 95.81% <66.66%> (-0.76%) ⬇️
airflow/www/views.py 76.19% <76.19%> (-0.1%) ⬇️
airflow/providers/amazon/aws/hooks/sns.py 96.42% <94.11%> (-3.58%) ⬇️
airflow/utils/dag_processing.py 87.95% <96.42%> (+0.02%) ⬆️
...flow/providers/apache/cassandra/hooks/cassandra.py 21.51% <0%> (-72.16%) ⬇️
...w/providers/apache/hive/operators/mysql_to_hive.py 35.84% <0%> (-64.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ec2774...8c1feb4. Read the comment docs.

@@ -569,7 +569,7 @@ def __init__(self,
# Map from file path to the processor
self._processors: Dict[str, AbstractDagFileProcessorProcess] = {}

self._heartbeat_count = 0
self._no_run = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._no_run = 0
self._num_run = 0

Please? Seeing _no_run I expected it to be a boolean/flag variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this variable.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than changing no -> num this looks good, and removes the confusing of having a "heartbeat" method that is unlike the heartbeat method on Jobs (this one doesn't update a row in the DB like the other).

Pre-emptive approve, but please change the variable name.

@mik-laj mik-laj merged commit 83d826b into apache:master Feb 24, 2020
petedejoy pushed a commit to petedejoy/airflow that referenced this pull request Feb 24, 2020
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
dimberman pushed a commit to dimberman/airflow that referenced this pull request Jun 29, 2020
kaxil pushed a commit that referenced this pull request Jun 30, 2020
@kaxil kaxil added the type:improvement Changelog: Improvements label Jul 1, 2020
@kaxil kaxil added this to the Airflow 1.10.11 milestone Jul 1, 2020
kaxil pushed a commit that referenced this pull request Jul 1, 2020
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants