Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6006] Remove dag out of airflow package import. Depends on [AIRFLOW-6004] [AIRFLOW-6005] #6598

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 17, 2019

Note to reviewer: Please take a look at the last commit as this PR depends on other PRs.

s explained in https://issues.apache.org/jira/browse/AIRFLOW-6003
having classes/packages in "airflow" might lead to cyclic imports
and is genreally a bad practice. It makes more sense to import
those packages from where they belong - i.e. sub-packages of airflow.

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@potiuk potiuk force-pushed the remove-dag-out-of-airflow-package-import branch 7 times, most recently from c278c49 to 2d2ae55 Compare November 18, 2019 03:27
@madison-ookla
Copy link
Contributor

This seems like it's changing module locations - should a note be added to UPDATING about how to upgrade DAGs/plugins to use the appropriate import location?

@potiuk
Copy link
Member Author

potiuk commented Nov 18, 2019

@mattbowden-ookla -> yep. I plan to update the UPDATING.md once we get to agreement if this is a good approach in general and how to untangle some of the implicit dependencies we have - discussion in #6596.

@ashb ashb self-requested a review November 18, 2019 16:42
@potiuk potiuk force-pushed the remove-dag-out-of-airflow-package-import branch 5 times, most recently from 28590d8 to 4f02e45 Compare November 19, 2019 22:25
There were a number of problems involving cyclic imports in
Airflow's core. Mainly about settingsi, DAG context management, base operator
imports and serialisation.

Some of those problems were workarounded by #pylint: disables (for pylint),
some of them were bypassed with TYPE_CHECKING (for mypy) and some of them were
just hidden because pylint check was splitting filei lists while TravisiCI
build.

This commit fixes most of the problems (only executor problem is left) and
removes all the workarounds.

The fixes are:

* Context for DAG context management was loaded from settings and
Now context managemen is moved to DAG and 'import settings' is not
needed in baseoperator, subdag_operator.

* Serialized Fields are lazy initialised - they were previously
initialized while parsing the python modules which made it impossible to avoid
cycles.

* SerializedDagModel is removed from 'airflow.models' and imported
directly from serialization package. This is only internal class and does not
need to be exposed via models

* BaseOperator in core is imported from baseoperator package
rather than from 'airflow.models'. This helps in importing the whole airflow
__init__ of 'airflow' without having to pay attention
to the sequence of imports there.

* BaseOperator on the other hand is imported from 'airflowi.models' in
operators/DAGs/hooks/sensors. This is important for Backporting (AIP-21)

* The imports of BaseOperator are enforced with pre-commit.

* All the pylint/mypy hacks related to cyclic imports are removed
    There are cyclic imports detected seemingly randomly by pylint checks when some
    of the PRs are run in CI:

    ************* Module airflow.utils.log.json_formatter
    airflow/utils/log/json_formatter.py:1:0: R0401: Cyclic import
    (airflow.executors -> airflow.executors.kubernetes_executor ->
    airflow.kubernetes.pod_generator) (cyclic-import)
    airflow/utils/log/json_formatter.py:1:0: R0401: Cyclic import (airflow ->
    airflow.executors -> airflow.executors.kubernetes_executor ->
    airflow.kubernetes.pod_launcher) (cyclic-import)
    airflow/utils/log/json_formatter.py:1:0: R0401: Cyclic import
    (airflow.executors -> airflow.executors.kubernetes_executor ->
    airflow.kubernetes.worker_configuration -> airflow.kubernetes.pod_generator)
    (cyclic-import)

    The problem is that airflow's _init_ contains a few convenience imports
    (AirflowException, Executors etc.) but it also imports a number of packages
    (for example kubernetes_executor) that in turn import the airflow package
    objects - for example airflow.Executor. This leads to cyclic imports if you
    import first the executors before airflow. Similar problem happens with
    executor._init_.py containing class "Executors" imported by all executors but
    at the same time some of the executors (for example KubernetesExecutor) import
    the very same Executor class.

    It was not deterministic because pylint usually uses as many processors as
    many are available and it splits the list of .py files between the separate
    pylint processors - depending on how the split is done, pylint check might
    or might not detect it. The cycle is always detected when all files are used.

    This might happen in pylint checks in pre-commit because they split a number of
    files they process between the multiple threads you have at your machine and
    sometimes it might happen that the files are imported in different order.

    As a solution, the executors "list" should be moved to a separate module.
    Also the name of constants was changed to not to be confused with class
    names and Executors class was renamed to AvailableExecutors.

    Additionally require_serial is set in pre-commit configuration to
    make sure cycle detection is deterministic.
AirflowException should always be imported directly from 'airflow' package.
As explained in https://issues.apache.org/jira/browse/AIRFLOW-6003
having classes/packages in "airflow" might lead to cyclic imports
and is genreally a bad practice. It makes more sense to import
those packages from where they belong - i.e. sub-packages of airflow
in core and from airflow package from integrations/outside DAGs
where risk of cyclic imports is very small.
@potiuk potiuk force-pushed the remove-dag-out-of-airflow-package-import branch from 4f02e45 to 3b275a5 Compare November 23, 2019 21:37
@potiuk potiuk closed this Dec 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants