Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6010] Remove cyclic imports and pylint disables #6601

Merged
merged 1 commit into from
Nov 26, 2019

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 18, 2019

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

There were a number of problems involving cyclic imports in
Airflow's core. Mainly about settingsi, DAG context management, base operator
imports and serialisation.

Some of those problems were workarounded by #pylint: disables (for pylint),
some of them were bypassed with TYPE_CHECKING (for mypy) and some of them were
just hidden because pylint check was splitting filei lists while TravisiCI
build.

This commit fixes most of the problems (only executor problem is left) and
removes all the workarounds.

The fixes are:

  • Context for DAG context management was loaded from settings and
    Now context managemen is moved to DAG and 'import settings' is not
    needed in baseoperator, subdag_operator.

  • Serialized Fields are lazy initialised - they were previously
    initialized while parsing the python modules which made it impossible to avoid
    cycles.

  • SerializedDagModel is removed from 'airflow.models' and imported
    directly from serialization package. This is only internal class and does not
    need to be exposed via models

  • BaseOperator in core is imported from baseoperator package
    rather than from 'airflow.models'. This helps in importing the whole airflow
    init of 'airflow' without having to pay attention
    to the sequence of imports there.

  • BaseOperator on the other hand is imported from 'airflowi.models' in
    operators/DAGs/hooks/sensors. This is important for Backporting (AIP-21)

  • The imports of BaseOperator are enforced with pre-commit.

  • All the pylint/mypy hacks related to cyclic imports are removed.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@potiuk potiuk changed the title [AIRFLOW-6010] Do not import airflow settings directly [AIRFLOW-6010] Do not import airflow settings directly. Depends on [AIRFLOW-6009] Nov 18, 2019
airflow/bin/cli.py Outdated Show resolved Hide resolved
@potiuk potiuk changed the title [AIRFLOW-6010] Do not import airflow settings directly. Depends on [AIRFLOW-6009] [AIRFLOW-6010] Do not import airflow settings directly. Nov 18, 2019
@potiuk potiuk force-pushed the do-not-import-airflow-settings branch 6 times, most recently from 3fffb90 to a7136fc Compare November 19, 2019 21:44
@potiuk potiuk changed the title [AIRFLOW-6010] Do not import airflow settings directly. [AIRFLOW-6010] Settings are not used for DAG context management Nov 19, 2019
@potiuk potiuk force-pushed the do-not-import-airflow-settings branch 3 times, most recently from c21d9d5 to bec08d0 Compare November 20, 2019 06:37
@potiuk potiuk changed the title [AIRFLOW-6010] Settings are not used for DAG context management [AIRFLOW-6010] Remove cyclic imports and pylint disables for core of Airflow Nov 20, 2019
@potiuk potiuk force-pushed the do-not-import-airflow-settings branch from bec08d0 to 98addaf Compare November 20, 2019 06:57
@potiuk potiuk changed the title [AIRFLOW-6010] Remove cyclic imports and pylint disables for core of Airflow [AIRFLOW-6010] Remove cyclic imports and pylint disables Nov 20, 2019
@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2019

@ashb @feluelle - but also @kaxil and @coufon as a lot of cyclic import problems were caused by DAG serialization, so please take a close look - maybe I overlooked something.

I indeed went back to the drawing board and I think I fixed the problems with cyclic imports in simple, and elegant way. I yet have to fix migration for serialization table and run the tests, but I have all the reasons to believe, that the changes I made (mainly in imports but also few other places) are good and they will lead to much nicer and easier-to-reason-about code.

Here is the detailed description of the changes:

There were a number of problems involving cyclic imports in
Airflow's core. Mainly about settings, DAG context management, base operator
imports and serialisation.

Some of those problems were workarounded by #pylint: disables (for pylint),
some of them were bypassed with TYPE_CHECKING (for mypy) and some of them were
just hidden because pylint check was splitting file lists while TravisiCI
build.

This commit fixes most of the problems (only executor problem is left) and
removes all the workarounds.

The fixes are:

* Context for DAG context management was loaded from settings and
Now context managemen is moved to DAG and 'import settings' is not
needed in baseoperator, subdag_operator.

* Serialized Fields are lazy initialised - they were previously
initialized while parsing the python modules which made it impossible to avoid
cycles.

* SerializedDagModel is removed from 'airflow.models' and imported
directly from serialization package. This is only internal class and does not
need to be exposed via models

* BaseOperator in core is imported from baseoperator package
rather than from 'airflow.models'. This helps in importing the whole airflow
__init__ of 'airflow' without having to pay attention
to the sequence of imports there.

* BaseOperator on the other hand is imported from 'airflowi.models' in
operators/DAGs/hooks/sensors. This is important for Backporting (AIP-21)

* The imports of BaseOperator are enforced with pre-commit.

* All the pylint/mypy hacks related to cyclic imports are removed.

CONTRIBUTING.rst Outdated Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much happier with the scope of this change, and the rules you've got in the contributing doc sound good to me

airflow/models/serialized_dag.py Outdated Show resolved Hide resolved
airflow/serialization/serialization.py Outdated Show resolved Hide resolved
docs/conf.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Nov 20, 2019

Oh we should probably put a note in UPDATING about the removal of settings.CONTEXT_MANAGER_DAG -- someone out there might be using it in custom code/a DAG generator plugin type thing?

@potiuk potiuk force-pushed the do-not-import-airflow-settings branch from 98addaf to 38594eb Compare November 20, 2019 10:46
@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2019

I'm okay either way (not changing things we don't need to, but also okay with these all living in one file), though I would marginally prefer it to be called airflow.serialization and not airflow.serialization.serialization :)

@ashb - I renamaed it to "serialization_base.py -> SerializationBase class.

@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2019

Oh we should probably put a note in UPDATING about the removal of settings.CONTEXT_MANAGER_DAG -- someone out there might be using it in custom code/a DAG generator plugin type thing?

Updated

@potiuk potiuk force-pushed the do-not-import-airflow-settings branch 2 times, most recently from a08a27c to bee8cdb Compare November 20, 2019 13:11
@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2019

@ashb @kaxil @feluelle @coufon -> waiting for Travis CI now. Since I was modifying BaseOperator I also updated/fixed a few typehints to 3.6, fixed few - previously pylint-disabled - warnings, and also extracted (PyCharm noticed it) the object_to_json as common code between SerializedDAG and SerializedBaseOperator.

Nice code! though I hope in the future we will keep the non-cyclic imports and won't have to do unnecessary pylint: disables for them (and the TYPE_CHECKING hack was particularly "interesting" ;).

I will add a shortly the Executors separation (and enable Pylint in 'require serial' mode) which means that it will be always processing all files (not in pylint_todo.txt) in single process and in case of more cycles they should be detected immediately + add some more 'best-practices' checks in pre-commit to guide the developers in the future so that they will not introduce cycles accidentally.

UPDATING.md Show resolved Hide resolved
airflow/contrib/operators/dynamodb_to_s3.py Show resolved Hide resolved
airflow/models/__init__.py Show resolved Hide resolved
airflow/models/baseoperator.py Outdated Show resolved Hide resolved
airflow/models/baseoperator.py Show resolved Hide resolved
airflow/models/baseoperator.py Outdated Show resolved Hide resolved
airflow/models/baseoperator.py Show resolved Hide resolved
airflow/models/baseoperator.py Show resolved Hide resolved
airflow/models/baseoperator.py Show resolved Hide resolved
airflow/models/dag.py Show resolved Hide resolved
Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good overall 👍

airflow/models/dag.py Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/serialization/base_serialization.py Outdated Show resolved Hide resolved
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments, otherwise looking great @potiuk!

@potiuk potiuk force-pushed the do-not-import-airflow-settings branch from 0e6b5c4 to 75d39a9 Compare November 25, 2019 21:14
@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2019

@ashb @kaxil -> all comments should be addressed.

@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2019

All checks passed @kaxil :)

@kaxil
Copy link
Member

kaxil commented Nov 26, 2019

@potiuk - We have merge conflicts !

@potiuk potiuk force-pushed the do-not-import-airflow-settings branch from 75d39a9 to 7018e68 Compare November 26, 2019 09:50
@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2019

@kaxil -> thanks! Rebased.

Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@feluelle
Copy link
Member

feluelle commented Nov 26, 2019

Unfortunately there are k8s errors 🙈

1) FAIL: test_integration_run_dag (tests.integration.kubernetes.test_kubernetes_executor.TestKubernetesExecutor)

----------------------------------------------------------------------

   Traceback (most recent call last):

    tests/integration/kubernetes/test_kubernetes_executor.py line 198 in test_integration_run_dag

      expected_final_state='success', timeout=100)

    tests/integration/kubernetes/test_kubernetes_executor.py line 105 in monitor_task

      self.assertEqual(state, expected_final_state)

   AssertionError: 'failed' != 'success'

   - failed

   + success

@feluelle
Copy link
Member

I restarted them - they were looking flaky.

@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2019

aaaaah.. another conflict, right after it passed :)

There were a number of problems involving cyclic imports in
Airflow's core. Mainly about settingsi, DAG context management, base operator
imports and serialisation.

Some of those problems were workarounded by #pylint: disables (for pylint),
some of them were bypassed with TYPE_CHECKING (for mypy) and some of them were
just hidden because pylint check was splitting filei lists while TravisiCI
build.

This commit fixes most of the problems (only executor problem is left) and
removes all the workarounds.

The fixes are:

* Context for DAG context management was loaded from settings and
Now context managemen is moved to DAG and 'import settings' is not
needed in baseoperator, subdag_operator.

* Serialized Fields are lazy initialised - they were previously
initialized while parsing the python modules which made it impossible to avoid
cycles.

* SerializedDagModel is removed from 'airflow.models' and imported
directly from serialization package. This is only internal class and does not
need to be exposed via models

* BaseOperator in core is imported from baseoperator package
rather than from 'airflow.models'. This helps in importing the whole airflow
__init__ of 'airflow' without having to pay attention
to the sequence of imports there.

* BaseOperator on the other hand is imported from 'airflowi.models' in
operators/DAGs/hooks/sensors. This is important for Backporting (AIP-21)

* The imports of BaseOperator are enforced with pre-commit.

* All the pylint/mypy hacks related to cyclic imports are removed
@potiuk potiuk force-pushed the do-not-import-airflow-settings branch from 7018e68 to 0e8be9e Compare November 26, 2019 20:29
@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2019

rebased again

@kaxil kaxil merged commit 03c870a into apache:master Nov 26, 2019
@kaxil
Copy link
Member

kaxil commented Nov 26, 2019

Thanks @potiuk

@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2019

Finally :). Thanks @kaxil

kaxil added a commit that referenced this pull request Dec 18, 2019
…#6601)

(cherry-picked from 03c870a)

Note: Only DAG Serialization bits have been cherry-picked
ashb pushed a commit that referenced this pull request Dec 18, 2019
…#6601)

(cherry-picked from 03c870a)

Note: Only DAG Serialization bits have been cherry-picked
ashb pushed a commit that referenced this pull request Dec 19, 2019
…#6601)

(cherry-picked from 03c870a)

Note: Only DAG Serialization bits have been cherry-picked
kaxil added a commit that referenced this pull request Dec 19, 2019
…#6601)

(cherry-picked from 03c870a)

Note: Only DAG Serialization bits have been cherry-picked
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants