-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-6897] Simplify DagFileProcessorManager #7521
[AIRFLOW-6897] Simplify DagFileProcessorManager #7521
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7521 +/- ##
==========================================
- Coverage 86.81% 85.92% -0.89%
==========================================
Files 893 893
Lines 42193 42229 +36
==========================================
- Hits 36629 36285 -344
- Misses 5564 5944 +380
Continue to review full report at Codecov.
|
airflow/utils/dag_processing.py
Outdated
@@ -569,7 +569,7 @@ def __init__(self, | |||
# Map from file path to the processor | |||
self._processors: Dict[str, AbstractDagFileProcessorProcess] = {} | |||
|
|||
self._heartbeat_count = 0 | |||
self._no_run = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._no_run = 0 | |
self._num_run = 0 |
Please? Seeing _no_run
I expected it to be a boolean/flag variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than changing no
-> num
this looks good, and removes the confusing of having a "heartbeat" method that is unlike the heartbeat method on Jobs (this one doesn't update a row in the DB like the other).
Pre-emptive approve, but please change the variable name.
(cherry picked from commit 83d826b)
(cherry picked from commit 83d826b)
(cherry picked from commit 83d826b)
(cherry picked from commit 83d826b)
Extract start_new_processes and prepare_file_path_queue
These functions are independent of each other, but as separate methods, they are easier to understand. A good methods name is the best documentation.
I also changed to
_heartbeat_count
to_no_run
because it better describes the role. This variable can increase without affecting heartbeat when DagFileProcessorManager is running in asynchronous mode.qMove emit_metrics call
It is not related to generating file_path_queue, so it should be called by the caller.
Inline heartbeat method
This function does not make sense, because every time a full loop is performed. It only hides the logic and makes it difficult to understand the code.
Move _kill_timed_out_processors to loop
It is not related to generating the collect_results method, so it should be called by the caller.
Now I have the impression that the sequence of operations is much more easy to understand because it is only found in the start method.
In this PR I do not want to change the sequence of operations. It is only refactoring without functional changes.
Issue link: AIRFLOW-6897
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.